is something like 'core local storage' possible? - linux

I know there's thread local storage (TLS) for threads, however, is there something like core local storage for each core in multi-core environment ?

Since a thread can be moved from core to core without warning, there doesn't seem to be any obvious use for such a capability.

Does it exist? As others have said there's really no point for it that I can see.
But can you build something like that? Oh very simple just grab some memory and index it with the cpu.id, there's no magic behind thread local storage either.
If you're trying to improve performance, just consider that thread local storage well be exclusively in one cores cache which is pretty much just as well and has the advantage of adapting to changing cores cheaply.

Related

System.OutOfMemoryException error

We are working on a rich client application in which many threads are running as well third party controls are used, after running application for 1 hour it starts giving error of 'System.OutOfMemoryException' unless and until we restart the application, i have search many sites for help but no particular and specified reason is giving.
Thanks.
It sounds pretty self-explanatory, you're system doesn't have enough memory. If you're still running the application as 32-bit then moving to 64-bit might solve the problem. I had exactly that problem on a server-2008-r2 recently, and moving to 64 bit did solve my problem. But if you're already 64 bit then perhaps the server doesn't have enough physical memory. In which case, you need to add more memory, or work out how to make your application less memory hungry. There could be objects that could be discarded that it's keeping references to, etc, and if that's the case you should try profiling to try and identify what's hogging the most memory. Beyond that, does the application use any unmanaged DLLs, e.g. COM objects written in C++ or similar. Maybe there's a memory leak outside of the managed framework?
I recommend using a profiler to identify and find where does the high memory consumption come from.

Is there a way to share memory among workers/threads/something in Node.JS?

I have a Node app which accesses a static, large (>100M), complex, in-memory data structure, accepts queries, and then serves out little slices of that data to the client over HTTP.
Most queries can be answered in tenths of a second. Hurray for Node!
But, for certain queries, searching this data structure takes a few seconds. This sucks because everyone else has to wait.
To serve more clients efficiently, I would like to use some sort of parallelism.
But, because this data structure is so large, I would like to share it among the workers or threads or what have you, so I don't burn hundreds of megabytes. This would be perfectly safe, because the data structure is not going to be written to. A typical 'fork()' in any other language would do it.
However, as far as I can tell, all the standard ways of doing parallelism in Node explicitly make this impossible. For safety, they don't want you to share anything.
But is there a way?
Background:
It is impractical to put this data structure in a database, or use memcached, or anything like that.
WebWorker API libraries and similar only allow short serialized messages to be passed in and out of the workers.
Node's Cluster uses a call named 'fork', but it is not really a fork of the existing process, it is spawning a new one. So once again, no shared memory.
Probably the really correct answer would be to use filesystem-like access to shared memory, aka tmpfs, or mmap. There are some node libraries that make mount() and mmap() available for exactly something like this. Unfortunately then one has to implement complex data structure access on top of synchronous seeks and reads. My application uses arrays of arrays of dicts and so on. It would be nice to not have to reimplement all that.
I tried write a C/C++ binding of shared memory access from nodejs. https://github.com/supipd/node-shm
Still work in progress (but working for me), maybe usefull, if bug or suggestion, inform me.
building with waf is old style (node 0.6 and below), new build is with gyp.
You should look at node cluster (http://nodejs.org/api/cluster.html). Not clear this is going to help you without having more details, but this runs multiple node processes on the same machine using fork.
Actually Node does support spawning processes. I'm not sure how close Node's fork is to real fork, but you can try it:
http://nodejs.org/api/child_process.html#child_process_child_process_fork_modulepath_args_options
By the way: it is not true that Node is unsuited for that. It is as suited as any other language/web server. You can always fire multiple instances of your server on different ports and put a proxy in front.
If you need more memory - add more memory. :) It is as simple as that. Also you should think about putting all of that data on a dedicated in-memory database like Redis or Memcached ( or even Couchbase if you need complex queries ). You won't have to worry about duplicating that data any more.
Most web applications spend the majority of their life waiting for network buffers and database reads. Node.js is designed to excel at this io bound work. If your work is truly bound by the CPU, you might be served better by another platform.
With that out of the way...
Use process.nextTick (perhaps even nested blocks) to make sure that expensive CPU work is properly asynchronous and not allowed to block your thread. This will make sure one client making expensive requests doesn't negatively impact all the others.
Use node.js cluster to add a worker process for each CPU in the system. Worker processes can all bind to a single HTTP port and use Memcached or Redis to share memory state. Workers also have a messaging API that can be used to keep an in-process memory cache synchronized, however it has some consistency limitations.

Simulating Disk I/O

We are currently evaluating storage for a virtualization environment (Xen). The storage is an active-active cluster and I need to test some stuff there like split brain scenarios etc.
I'm looking for a tool that simulates a lot of small disk I/O, like a virtual machine would read/write to its image file.
I don't need performance testing tools but more something like data integrity. Is there anything around?
btest[1] is able to do such things, it can read/write in random/sequential and can also verify the data afterwards.
[1] http://sourceforge.net/projects/btest/

mmap file shared via nfs?

Scenario A:
To share a read/write block of memory between two processes running on the same host, Joe mmaps the same local file from both processes.
Scenario B:
To share a read/write block of memory between two processes running on two different hosts, Joe shares a file via nfs between the hosts, and then mmaps the shared file from both processes.
Has anyone tried Scenario B? What are the extra problems that arise in Scenario B that do not apply to Scenario A?.
Mmap will not share data without some additional actions.
If you change data in mmaped part of file, changes will be stored only in memory. They will not be flushed to the filesystem (local or remote) until msync or munmap or close or even decision of OS kernel and its FS.
When using NFS, locking and storing of data will be slower than if using local FS. Timeouts of flushing and time of file operations will vary too.
On the sister site people says that NFS may have poor caching policy, so there will be much more I/O requests to the NFS server comparing I/O request count to local FS.
You will need byte-range-lock for correct behavior. They are available in NFS >= v4.0.
I'd say scenario B has all kinds of problems (assuming it works as suggested in the comments). The most obvious is the standards concurrency issues - 2 processes sharing 1 resource with no form of locking etc. That could lead to problems... Not sure whether NFS has its own peculiar quirks in this regard or not.
Assuming you can get around the concurrency issues somehow, you are now reliant on maintaining a stable (and speedy) network connection. Obviously if the network drops out, you might miss some changes. Whether this matters depends on your architecture.
My thought is it sounds like an easy way to share a block of memory on different machines, but I can't say I've heard of it being done which makes me think it isn't so good. When I think sharing data between procs, I think DBs, messaging or a dedicated server. In this case if you made one proc the master (to handle concurrency and owning the concept -i.e. whatever this guy says is the best copy of the data) it might work...

User-Contributed Code Security

I've seen some websites that can run code from the browser, and the code is evaluated on the server.
What is the security best-practice for applications that run user-contributed code? Besides of accessing and changing the server's sensitive information.
(for example, using a Python with a stripped-down version of the standard library)
How to prevent DoS like non-halting and/or CPU-intensive programs? (we can't use static code analysis here) What about DoSing the type check system?
Python, Prolog and Haskell are suggested examples to talk about.
The "best practice" (am I really the only one who hates that phrase?) is probably just not to do it at all.
If you really must do it, set it up to run in a virtual machine (and I don't mean something like a JVM; I mean something that hosts an OS) so it's easy to restore the VM from a snapshot (or whatever the VM in question happens to call it).
In most cases, you'll need to go a bit beyond just that though. Without some extra work to lock it down, even a VM can use enough resources to reduce responsiveness so it can be difficult to kill and restart it (you usually can eventually, but "eventually" is rarely what you want). You also generally want to set some quotas to limit its total CPU usage, probably limit it to using a single CPU (and run it on a machine with at least two), limit its total memory usage, etc. In Windows, for example, you can do (at least most of that) by starting the VM in a job object, and limiting the resources available to the job object.

Resources