I'm trying to build a Node application using worker threads, divided into three parts.
The primary thread that delegates tasks
A dedicated worker thread that updates shared data
A pool of worker threads that run calculations on shared data
The shared data is in the form of several SharedArrayBuffer objects operating like a pseudo-database. I would like to be able to update the data without needing to pause calculations, and I'm ok with a few tasks using slightly stale data. The flow I've come up with is:
Primary thread passes data to update thread
Update thread creates a whole new SharedArrayBuffer and populates it with updated data.
Update thread returns a pointer to the new buffer back to primary thread.
Primary thread caches the latest pointer in a variable, overwriting its previous value, and passes it to each worker thread with each task.
Worker threads don't retain these pointers at all after executing their operations.
The problem is, this seems to create a memory leak in the resident state stack when I run a prototype that frequently makes updates and swaps out the shared buffers. Garbage collection appears to make a couple of passes removing the discarded buffers, but then it climbs continuously until the application slows and eventually hangs or crashes.
How can I guarantee that a SharedArrayBuffer will get picked up by garbage collection when I'm done with it, or it it even possible? I've seen hints to the effect that as long as all references to it are removed from all threads it will eventually get picked up, but not a clear answer.
I'm using the threads.js library to abstract the worker thread operations. Here's a summary of my prototype:
app.ts:
import { ModuleThread, Pool, spawn, Worker } from "threads";
import { WriterModule } from "./workers/writer-worker";
import { CalculateModule } from "./workers/calculate-worker";
class App {
calculatePool = Pool<ModuleThread<CalculateModule>>
(() => spawn(new Worker('./workers/calculate-worker')), { size: 6 });
writerThread: ModuleThread<WriterModule>;
sharedBuffer: SharedArrayBuffer;
dataView: DataView;
constructor() {
this.sharedBuffer = new SharedArrayBuffer(1000000);
this.dataView = new DataView(this.sharedBuffer);
}
async start(): Promise<void> {
this.writerThread = await spawn<WriterModule>(new Worker('./workers/writer-worker'));
await this.writerThread.init(this.sharedBuffer);
await this.update();
// Arbitrary delay between updates
setInterval(() => this.update(), 5000);
while (true) {
// Arbitrary delay between tasks
await new Promise<void>(resolve => setTimeout(() => resolve(), 250));
this.calculate();
}
}
async update(): Promise<void> {
const updates: any[] = [];
// generates updates
this.sharedBuffer = await this.writerThread.update(updates);
this.dataView = new DataView(this.sharedBuffer);
}
async calculate(): Promise<void> {
const task = this.calculatePool.queue(async (calc) => calc.calculate(this.sharedBuffer));
const sum: number = await task;
// Use result
}
}
const app = new App();
app.start();
writer-worker.ts:
import { expose } from "threads";
let sharedBuffer: SharedArrayBuffer;
const writerModule = {
async init(startingBuffer: SharedArrayBuffer): Promise<void> {
sharedBuffer = startingBuffer;
},
async update(data: any[]): Promise<SharedArrayBuffer> {
// Arbitrary update time
await new Promise<void>(resolve => setTimeout(() => resolve(), 500));
const newSharedBuffer = new SharedArrayBuffer(1000000);
// Copy some values from the old buffer over, perform some mutations, etc.
sharedBuffer = newSharedBuffer;
return sharedBuffer;
},
}
export type WriterModule = typeof writerModule;
expose(writerModule);
calculate-worker.ts
import { expose } from "threads";
const calculateModule = {
async calculate(sharedBuffer: SharedArrayBuffer): Promise<number> {
const view = new DataView(sharedBuffer);
// Arbitrary calculation time
await new Promise<void>(resolve => setTimeout(() => resolve(), 100));
// Run arbitrary calculation
return sum;
}
}
export type CalculateModule = typeof calculateModule;
expose(calculateModule);
There's an async method in some external library, which is not thread-safe (corrupts internal state when being called concurrently) and is very unlikely to be ever fixed in the near future. Yet I want the rest of my codebase to freely use it in any possible parallel vs. sequential combinations (in other words, I want my methods that use that method to still be thread-safe).
So I need to wrap that problematic method with some Active Object implementation, so that its invocations are always aligned (executed one after another).
My take on this is to use a chain of Promises:
var workQueue: Promise<any> = Promise.resolve();
function executeSequentially<T>(action: () => Promise<T>): Promise<T> {
return new Promise((resolve, reject) => {
workQueue = workQueue.then(() => {
return action().then(resolve, reject);
});
});
}
And then instead of directly calling that method:
const myPromise = asyncThreadUnsafeMethod(123);
I'll be calling it like this:
const myPromise = executeSequentially(() => asyncThreadUnsafeMethod(123));
But am I missing something? Any potential problems with this approach? Any better ideas?
There are no problems with this approach, using a promise queue is a standard pattern that will solve your problem.
While your code does work fine, I would recommend to avoid the Promise constructor antipattern:
let workQueue: Promise<void> = Promise.resolve();
const ignore = _ => {};
function executeSequentially<T>(action: () => Promise<T>): Promise<T> {
const result = workQueue.then(action);
workQueue = result.then(ignore, ignore);
return result;
}
Also you say that you'd be "calling the method" like executeSequentially(() => asyncThreadUnsafeMethod(123));, but this does leave the potential for mistakes where you forget the executeSequentially wrapper and still call the unsafe method directly. Instead I'd recommend to introduce an indirection where you wrap the entire safe call in a function and then export only that, having your codebase only interact with that facade and never with the library itself. To help with the creation of such a facade, I'd curry the executeSequentially function:
const ignore = _ => {};
function makeSequential<T, A extends unknown[]>(fn: (...args: A) => Promise<T>) => (...args: A) => Promise<T> {
let workQueue: Promise<void> = Promise.resolve();
return (...args) => {
const result = workQueue.then(() => fn(...args));
workQueue = result.then(ignore, ignore);
return result;
};
}
export const asyncSafeMethod = makeSequential(asyncThreadUnsafeMethod);
or just directly implement it for that one case
const ignore = _ => {};
let workQueue: Promise<void> = Promise.resolve();
export function asyncSafeMethod(x: number) {
const result = workQueue.then(() => asyncThreadUnsafeMethod(x));
workQueue = result.then(ignore, ignore);
return result;
}
import { asyncSafeMethod } from …;
asyncSafeMethod(123);
I'm using an Observable to provide event subscription interface for clients from a global resource, and I need to manage that resource according to the number of active subscriptions:
Allocate global resource when the number of subscriptions becomes greater than 0
Release global resource when the number of subscriptions becomes 0
Adjust the resource usage strategy based on the number of subscriptions
What is the proper way in RXJS to monitor the number of active subscriptions?
How to implement the following within RXJS syntax? -
const myEvent: Observable<any> = new Observable();
myEvent.onSubscription((newCount: number, prevCount: number) => {
if(newCount === 0) {
// release global resource
} else {
// allocate global resource, if not yet allocated
}
// for a scalable resource usage / load,
// re-configure it, based on newCount
});
I wouldn't expect a guaranteed notification on each change, hence newCount + prevCount params.
UPDATE-1
This is not a duplicate to this, because I need to be notified when the number of subscriptions changes, and not just to get the counter at some point.
UPDATE-2
Without any answer so far, I quickly came up with a very ugly and limited work-around, through complete incapsulation, and specifically for type Subject. Hoping very much to find a proper solution.
UPDATE-3
After a few answers, I'm still not sure how to implement what I'm trying, which is the following:
class CustomType {
}
class CountedObservable<T> extends Observable<T> {
private message: string; // random property
public onCount; // magical Observable that needs to be implemented
constructor(message: string) {
// super(); ???
this.message = message;
}
// random method
public getMessage() {
return this.message;
}
}
const a = new CountedObservable<CustomType>('hello'); // can create directly
const msg = a.getMessage(); // can call methods
a.subscribe((data: CustomType) => {
// handle subscriptions here;
});
// need that magic onCount implemented, so I can do this:
a.onCount.subscribe((newCount: number, prevCont: number) => {
// manage some external resources
});
How to implement such class CountedObservable above, which would let me subscribe to itself, as well as its onCount property to monitor the number of its clients/subscriptions?
UPDATE-4
All suggested solutions seemed overly complex, and even though I accepted one of the answers, I ended up with a completely custom solution one of my own.
You could achieve it using defer to track subscriptions and finalize to track completions, e.g. as an operator:
// a custom operator that will count number of subscribers
function customOperator(onCountUpdate = noop) {
return function refCountOperatorFunction(source$) {
let counter = 0;
return defer(()=>{
counter++;
onCountUpdate(counter);
return source$;
})
.pipe(
finalize(()=>{
counter--;
onCountUpdate(counter);
})
);
};
}
// just a stub for `onCountUpdate`
function noop(){}
And then use it like:
const source$ = new Subject();
const result$ = source$.pipe(
customOperator( n => console.log('Count updated: ', n) )
);
Heres a code snippet illustrating this:
const { Subject, of, timer, pipe, defer } = rxjs;
const { finalize, takeUntil } = rxjs.operators;
const source$ = new Subject();
const result$ = source$.pipe(
customOperator( n => console.log('Count updated: ', n) )
);
// emit events
setTimeout(()=>{
source$.next('one');
}, 250);
setTimeout(()=>{
source$.next('two');
}, 1000);
setTimeout(()=>{
source$.next('three');
}, 1250);
setTimeout(()=>{
source$.next('four');
}, 1750);
// subscribe and unsubscribe
const subscriptionA = result$
.subscribe(value => console.log('A', value));
setTimeout(()=>{
result$.subscribe(value => console.log('B', value));
}, 500);
setTimeout(()=>{
result$.subscribe(value => console.log('C', value));
}, 1000);
setTimeout(()=>{
subscriptionA.unsubscribe();
}, 1500);
// complete source
setTimeout(()=>{
source$.complete();
}, 2000);
function customOperator(onCountUpdate = noop) {
return function refCountOperatorFunction(source$) {
let counter = 0;
return defer(()=>{
counter++;
onCountUpdate(counter);
return source$;
})
.pipe(
finalize(()=>{
counter--;
onCountUpdate(counter);
})
);
};
}
function noop(){}
<script src="https://unpkg.com/rxjs#6.4.0/bundles/rxjs.umd.min.js"></script>
* NOTE: if your source$ is cold — you might need to share it.
Hope it helps
You are really asking three separate questions here, and I question whether you really need the full capability that you mention. Since most of the resource managment stuff you are asking for is already provided for by the library, doing custom tracking code seems to be redundant. The first two questions:
Allocate global resource when the number of subscriptions becomes greater than 0
Release global resource when the number of subscriptions becomes 0
Can be done with the using + share operators:
class ExpensiveResource {
constructor () {
// Do construction
}
unsubscribe () {
// Do Tear down
}
}
// Creates a resource and ties its lifecycle with that of the created `Observable`
// generated by the second factory function
// Using will accept anything that is "Subscription-like" meaning it has a unsubscribe function.
const sharedStream$ = using(
// Creates an expensive resource
() => new ExpensiveResource(),
// Passes that expensive resource to an Observable factory function
er => timer(1000)
)
// Share the underlying source so that global creation and deletion are only
// processed when the subscriber count changes between 0 and 1 (or visa versa)
.pipe(share())
After that sharedStream$ can be passed around as a base stream which will manage the underlying resource (assuming you implemented your unsubscribe correctly) so that the resource will be created and torn down as the number of subscribers transitions between 0 and 1.
Adjust the resource usage strategy based on the number of subscriptions
The third question I am most dubious on, but I'll answer it for completeness assuming you know your application better than I do (since I can't think of a reason why you would need specific handling at different usage levels other than going between 0 and 1).
Basically I would use a similar approach as above but I would encapuslate the transition logic slightly differently.
// Same as above
class ExpensiveResource {
unsubscribe() { console.log('Tear down this resource!')}
}
const usingReferenceTracking =
(onUp, onDown) => (resourceFactory, streamFactory) => {
let instance, refCount = 0
// Again manage the global resource state with using
const r$ = using(
// Unfortunately the using pattern doesn't let the resource escape the closure
// so we need to cache it for ourselves to use later
() => instance || (instance = resourceFactory()),
// Forward stream creation as normal
streamFactory
)
).pipe(
// Don't forget to clean up the stream after all is said and done
// Because its behind a share this should only happen when all subscribers unsubscribe
finalize(() => instance = null)
share()
)
// Use defer to trigger "onSubscribe" side-effects
// Note as well that these side-effects could be merged with the above for improved performance
// But I prefer them separate for easier maintenance.
return defer(() => onUp(instance, refCount += 1) || r$)
// Use finalize to handle the "onFinish" side-effects
.pipe(finalize(() => onDown(instance, refCount -= 1)))
}
const referenceTracked$ = usingReferenceTracking(
(ref, count) => console.log('Ref count increased to ' + count),
(ref, count) => console.log('Ref count decreased to ' + count)
)(
() => new ExpensiveResource(),
ref => timer(1000)
)
referenceTracked$.take(1).subscribe(x => console.log('Sub1 ' +x))
referenceTracked$.take(1).subscribe(x => console.log('Sub2 ' +x))
// Ref count increased to 1
// Ref count increased to 2
// Sub1 0
// Ref count decreased to 1
// Sub2 0
// Ref count decreased to 0
// Tear down this resource!
Warning: One side effect of this is that by definition the stream will be warm once it leaves the usingReferenceTracking function, and it will go hot on first subscription. Make sure you take this into account during the subscription phase.
What a fun problem! If I am understanding what you are asking, here is my solution to this: create a wrapper class around Observable that tracks the subscriptions by intercepting both subscribe() and unsubscribe(). Here is the wrapper class:
export class CountSubsObservable<T> extends Observable<T>{
private _subCount = 0;
private _subCount$: BehaviorSubject<number> = new BehaviorSubject(0);
public subCount$ = this._subCount$.asObservable();
constructor(public source: Observable<T>) {
super();
}
subscribe(
observerOrNext?: PartialObserver<T> | ((value: T) => void),
error?: (error: any) => void,
complete?: () => void
): Subscription {
this._subCount++;
this._subCount$.next(this._subCount);
let subscription = super.subscribe(observerOrNext as any, error, complete);
const newUnsub: () => void = () => {
if (this._subCount > 0) {
this._subCount--;
this._subCount$.next(this._subCount);
subscription.unsubscribe();
}
}
subscription.unsubscribe = newUnsub;
return subscription;
}
}
This wrapper creates a secondary observable .subCount$ that can be subscribed to which will emit every time the number of subscriptions to the source observable changes. It will emit a number corresponding to the current number of subscribers.
To use it you would create a source observable and then call new with this class to create the wrapper. For example:
const source$ = interval(1000).pipe(take(10));
const myEvent$: CountSubsObservable<number> = new CountSubsObservable(source$);
myEvent$.subCount$.subscribe(numSubs => {
console.log('subCount$ notification! Number of subscriptions is now', numSubs);
if(numSubs === 0) {
// release global resource
} else {
// allocate global resource, if not yet allocated
}
// for a scalable resource usage / load,
// re-configure it, based on numSubs
});
source$.subscribe(result => console.log('result is ', result));
To see it in use, check out this Stackblitz.
UPDATE:
Ok, as mentioned in the comments, I'm struggling a little to understand where the stream of data is coming from. Looking back through your question, I see you are providing an "event subscription interface". If the stream of data is a stream of CustomType as you detail in your third update above, then you may want to use fromEvent() from rxjs to create the source observable with which you would call the wrapper class I provided.
To show this I created a new Stackblitz. From that Stackblitz here is the stream of CustomTypes and how I would use the CountedObservable class to achieve what you are looking for.
class CustomType {
a: string;
}
const dataArray = [
{ a: 'January' },
{ a: 'February' },
{ a: 'March' },
{ a: 'April' },
{ a: 'May' },
{ a: 'June' },
{ a: 'July' },
{ a: 'August' },
{ a: 'September' },
{ a: 'October' },
{ a: 'November' },
{ a: 'December' }
] as CustomType[];
// Set up an arbitrary source that sends a stream of `CustomTypes`, one
// every two seconds by using `interval` and mapping the numbers into
// the associated dataArray.
const source$ = interval(2000).pipe(
map(i => dataArray[i]), // transform the Observable stream into CustomTypes
take(dataArray.length), // limit the Observable to only emit # array elements
share() // turn into a hot Observable.
);
const myEvent$: CountedObservable<CustomType> = new CountedObservable(source$);
myEvent$.onCount.subscribe(newCount => {
console.log('newCount notification! Number of subscriptions is now', newCount);
});
I hope this helps.
First of all, I very much appreciate how much time and effort people have committed trying to answer my question! And I am sure those answers will prove to be a useful guideline to other developers, solving similar scenarios with RXJS.
However, specifically for what I was trying to get out of RXJS, I found in the end that I am better off not using it at all. I specifically wanted the following:
A generic, easy-to-use interface for subscribing to notifications, plus monitoring subscriptions - all in one. With RXJS, the best I would end up is some workarounds that appear to be needlessly convoluted or even cryptic to developers who are not experts in RXJS. That is not what I would consider a friendly interface, more like something that rings over-engineering.
I ended up with a custom, much simpler interface that can do everything I was looking for:
export class Subscription {
private unsub: () => void;
constructor(unsub: () => void) {
this.unsub = unsub;
}
public unsubscribe(): void {
if (this.unsub) {
this.unsub();
this.unsub = null; // to prevent repeated calls
}
}
}
export class Observable<T = any> {
protected subs: ((data: T) => void)[] = [];
public subscribe(cb: (data: T) => void): Subscription {
this.subs.push(cb);
return new Subscription(this.createUnsub(cb));
}
public next(data: T): void {
// we iterate through a safe clone, in case an un-subscribe occurs;
// and since Node.js is the target, we are using process.nextTick:
[...this.subs].forEach(cb => process.nextTick(() => cb(data)));
}
protected createUnsub(cb) {
return () => {
this.subs.splice(this.subs.indexOf(cb), 1);
};
}
}
export interface ISubCounts {
newCount: number;
prevCount: number;
}
export class CountedObservable<T = any> extends Observable<T> {
readonly onCount: Observable<ISubCounts> = new Observable();
protected createUnsub(cb) {
const s = this.subs;
this.onCount.next({newCount: s.length, prevCount: s.length - 1});
return () => {
s.splice(s.indexOf(cb), 1);
this.onCount.next({newCount: s.length, prevCount: s.length + 1});
};
}
}
It is both small and elegant, and lets me do everything I needed to begin with, in a safe and friendly manner. I can do the same subscribe and onCount.subscribe, and get all the same notifications:
const a = new CountedObservable<string>();
const countSub = a.onCount.subscribe(({newCount, prevCount}) => {
console.log('COUNTS:', newCount, prevCount);
});
const sub1 = a.subscribe(data => {
console.log('SUB-1:', data);
});
const sub2 = a.subscribe(data => {
console.log('SUB-2:', data);
});
a.next('hello');
sub1.unsubscribe();
sub2.unsubscribe();
countSub.unsubscribe();
I hope this will help somebody else also.
P.S. I further improved it as an independent module.
I have the following code :
let bindings =
{
on : (messageName,callback) =>
{
bindings[messageName] = callback
}
}
bindings.on('test',(params) =>
{
setTimeout( () =>
{
console.log("call id " , params.callId)
},~~(Math.random()*100))
})
let data = {callId : 1 }
for (let i=0;i<5;i++)
{
bindings['test'](data)
data.callId++
}
it produces the output
call id 6
call id 6
call id 6
call id 6
call id 6
call id 6
I know this issue can be solved with a bind https://developer.mozilla.org/fr/docs/Web/JavaScript/Reference/Objets_globaux/Function/bind , but I cannot find the correct way to implement this and keep the actual design
adding a const fix the issue but I would like to find a more elegant/generic way to fix the issue
bindings.on('test',(params) =>
{
const callId = params.callId
setTimeout( () =>
{
console.log("call id " , callId)
},~~(Math.random()*100))
})
It is less about the implementation of your event logic but on the invocation of it using a reference that is being updated by a loop.
Since the implementation is an async execution due to the internal setTimeout, the invocation loop executes completely before the first event logic is executed. So because the input, params is a reference to the object data, when the loop is completed the callId value is now 6 which when referenced by any of the async calls will be that value.
Essentially the for loop is queuing up 5 async operations where each one is using the same referenced object, data. The loop is also updating a property value of the data object for each iteration. Because of how JavaScript handles asynchronous operations, the loop will complete before any of the asynchronous operations begin because it is still following the initial synchronous execution of the script.
One way to get around this is to create a new input object for each loop iteration:
let bindings = {
on: (messageName, callback) => {
bindings[messageName] = callback
}
}
bindings.on('test', (params) => {
setTimeout(() => {
console.log("call id ", params.callId)
}, ~~(Math.random() * 100));
});
for (let i = 0; i < 5; i++) {
bindings['test']({callId: i + 1});
}
I have been trying to update the performance of my code in regards to database queries. The problem I am currently running into is that I cant seem to find a way to get a new context for each subQuery.
Using the below simplified code will inconsitantly generate a "The underlying provider failed on Open."
using (var context = getNewContextObject())
{
var result = new SomeResultObject();
var parentQuery = context.SomeTable.Where(x => x.Name = "asdf");
Parallel.Invoke(() =>
{
result.count1 = parentQuery.Where(x => x.Amount >= 100 & x.Amount < 2000).Count();
}, () =>
{
result.count2 = parentQuery.Where(x => x.Amount < 100).Count();
}
, () =>
{
result.count3 = parentQuery.Where(x => x.Amount >= 2000).Count();
}
);
}
The only way around this so far seems to be to rebuild the entire query for each subQuery with a new context. Is there any way to avoid building each query from the bottom up with a new Context? Can I instead just attach each subquery query to a new context? I am looking for something like the below.
Parallel.Invoke(() =>
{
var subQuery = parentQuery.Where(x => x.Amount >= 100 & x.Amount < 2000).Count();
subQuery.Context = getNewContextObject();
result.count1 = subQuery.Count();
}, () =>
{
var subQuery = parentQuery.Where(x => x.Amount < 100).Count();
subQuery.Context = getNewContextObject();
result.count2 = subQuery.Count();
}
, () =>
{
var subQuery = parentQuery.Where(x => x.Amount >= 2000).Count();
subQuery.Context = getNewContextObject();
result.count3 = subQuery.Count();
}
);
}
I'm not sure how this exactly relates to your problem but none of EF features are thread safe so I expect that running multiple queries on the same context instance in parallel can blow up for any reason. For example they access the same connection property in the context but threads don't know about each other so they can close the connection to other thread or replace the instance with other connection instance (you can try to open the connection manually before you run parallel threads and close it once all threads are done but it doesn't have to be the only problem).