My application receives data from external sources and groups it (matching group items and shows result as HTML table, up to 20k msg/sec).
The problem is shared memory: application works in real time, I receive messages with flags "create, update, delete" so I store all in RAM, no need for any database. But when I try clustering my application messages are missed for some clusters (I tried cluster with pm2).
So now I am trying to scale my application with WorkerThreads, but communication via parentPort.postMessage/worker.postMessage requires a lot of changes to the application. So now I am trying to share memory via SharedArrayBuffer, but I don't understand how to share an array of objects between master and workers.
const {
Worker, isMainThread, parentPort, workerData
} = require('worker_threads');
if (isMainThread) {
const sab = new SharedArrayBuffer(1024);
sab[0] = {foo: 'bar'};
const worker = new Worker(__filename, {
workerData: sab
});
setInterval(() => {
console.log( sab[0] ); // always {foo: 'bar'}
}, 500);
} else {
let sab = workerData;
sab.foo = false; // changing "foo" value at worker but not in main thread
}
setInterval(() => {}, 1000);
Related
I'm trying to build a Node application using worker threads, divided into three parts.
The primary thread that delegates tasks
A dedicated worker thread that updates shared data
A pool of worker threads that run calculations on shared data
The shared data is in the form of several SharedArrayBuffer objects operating like a pseudo-database. I would like to be able to update the data without needing to pause calculations, and I'm ok with a few tasks using slightly stale data. The flow I've come up with is:
Primary thread passes data to update thread
Update thread creates a whole new SharedArrayBuffer and populates it with updated data.
Update thread returns a pointer to the new buffer back to primary thread.
Primary thread caches the latest pointer in a variable, overwriting its previous value, and passes it to each worker thread with each task.
Worker threads don't retain these pointers at all after executing their operations.
The problem is, this seems to create a memory leak in the resident state stack when I run a prototype that frequently makes updates and swaps out the shared buffers. Garbage collection appears to make a couple of passes removing the discarded buffers, but then it climbs continuously until the application slows and eventually hangs or crashes.
How can I guarantee that a SharedArrayBuffer will get picked up by garbage collection when I'm done with it, or it it even possible? I've seen hints to the effect that as long as all references to it are removed from all threads it will eventually get picked up, but not a clear answer.
I'm using the threads.js library to abstract the worker thread operations. Here's a summary of my prototype:
app.ts:
import { ModuleThread, Pool, spawn, Worker } from "threads";
import { WriterModule } from "./workers/writer-worker";
import { CalculateModule } from "./workers/calculate-worker";
class App {
calculatePool = Pool<ModuleThread<CalculateModule>>
(() => spawn(new Worker('./workers/calculate-worker')), { size: 6 });
writerThread: ModuleThread<WriterModule>;
sharedBuffer: SharedArrayBuffer;
dataView: DataView;
constructor() {
this.sharedBuffer = new SharedArrayBuffer(1000000);
this.dataView = new DataView(this.sharedBuffer);
}
async start(): Promise<void> {
this.writerThread = await spawn<WriterModule>(new Worker('./workers/writer-worker'));
await this.writerThread.init(this.sharedBuffer);
await this.update();
// Arbitrary delay between updates
setInterval(() => this.update(), 5000);
while (true) {
// Arbitrary delay between tasks
await new Promise<void>(resolve => setTimeout(() => resolve(), 250));
this.calculate();
}
}
async update(): Promise<void> {
const updates: any[] = [];
// generates updates
this.sharedBuffer = await this.writerThread.update(updates);
this.dataView = new DataView(this.sharedBuffer);
}
async calculate(): Promise<void> {
const task = this.calculatePool.queue(async (calc) => calc.calculate(this.sharedBuffer));
const sum: number = await task;
// Use result
}
}
const app = new App();
app.start();
writer-worker.ts:
import { expose } from "threads";
let sharedBuffer: SharedArrayBuffer;
const writerModule = {
async init(startingBuffer: SharedArrayBuffer): Promise<void> {
sharedBuffer = startingBuffer;
},
async update(data: any[]): Promise<SharedArrayBuffer> {
// Arbitrary update time
await new Promise<void>(resolve => setTimeout(() => resolve(), 500));
const newSharedBuffer = new SharedArrayBuffer(1000000);
// Copy some values from the old buffer over, perform some mutations, etc.
sharedBuffer = newSharedBuffer;
return sharedBuffer;
},
}
export type WriterModule = typeof writerModule;
expose(writerModule);
calculate-worker.ts
import { expose } from "threads";
const calculateModule = {
async calculate(sharedBuffer: SharedArrayBuffer): Promise<number> {
const view = new DataView(sharedBuffer);
// Arbitrary calculation time
await new Promise<void>(resolve => setTimeout(() => resolve(), 100));
// Run arbitrary calculation
return sum;
}
}
export type CalculateModule = typeof calculateModule;
expose(calculateModule);
okay. I'm confused as to the best way to do this:
the following pieces are in play: a node js server, a client-side react(with redux), a MYSql DB.
in the client app I have lists (many but for this issue, assume one), that I want to be able to reorder by drag and drop.
in the mysql DB the times are stored to represent a linked list (with a nextKey, lastKey, and productionKey(primary), along with the data fields),
//mysql column [productionKey, lastKey,nextKey, ...(other data)]
the current issue I'm having is a render issue. it stutters after every change.
I'm using these two function to get the initial order and to reorder
function SortLinkedList(linkedList)
{
var sortedList = [];
var map = new Map();
var currentID = null;
for(var i = 0; i < linkedList.length; i++)
{
var item = linkedList[i];
if(item?.lastKey === null)
{
currentID = item?.productionKey;
sortedList.push(item);
}
else
{
map.set(item?.lastKey, i);
}
}
while(sortedList.length < linkedList.length)
{
var nextItem = linkedList[map.get(currentID)];
sortedList.push(nextItem);
currentID = nextItem?.productionKey;
}
const filteredSafe=sortedList.filter(x=>x!==undefined)
//undefined appear because server has not fully updated yet, so linked list is broken
//nothing will render without this
return filteredSafe
;
}
const reorder = (list, startIndex, endIndex) => {
const result = Array.from(list);
const [removed] = result.splice(startIndex, 1);
result.splice(endIndex, 0, removed);
const adjustedResult = result.map((x,i,arr)=>{
if(i==0){
x.lastKey=null;
}else{
x.lastKey=arr[i-1].productionKey;
}
if(i==arr.length-1){
x.nextKey=null;
}else{
x.nextKey=arr[i+1].productionKey;
}
return x;
})
return adjustedResult;
};
I've got this function to get the items
const getItems = (list,jobList) =>
{
return list.map((x,i)=>{
const jobName=jobList.find(y=>y.jobsessionkey==x.attachedJobKey)?.JobName;
return {
id:`ProductionCardM${x.machineID}ID${x.productionKey}`,
attachedJobKey: x.attachedJobKey,
lastKey: x.lastKey,
machineID: x.machineID,
nextKey: x.nextKey,
productionKey: x.productionKey,
content:jobName
}
})
}
my onDragEnd
const onDragEnd=(result)=> {
if (!result.destination) {
return;
}
// dropped outside the list
const items = reorder(
state.items,
result.source.index,
result.destination.index,
);
dispatch(sendAdjustments(items));
//sends update to server
//server updates mysql
//server sends back update events from mysql in packets
//props sent to DnD component are updated
}
so the actual bug looks like the graphics are glitching - as things get temporarily filtered in the sortLinkedList function - resulting in jumpy divs. is there a smoother way to handle this client->server->DB->server->client dataflow that results in a consistent handling in DnD?
UPDATE:
still trying to solve this. currently implemented a lock pattern.
useEffect(()=>{
if(productionLock){
setState({
items: SortLinkedList(getItems(data,jobList)),
droppables: [{ id: "Original: not Dynamic" }]
})
setLoading(false);
}else{
console.log("locking first");
setLoading(true);
}
},[productionLock])
where production lock is set to true and false from triggers on the server...
basically: the app sends the data to the server, the server processes the request, then sends new data back, when it's finished the server sends the unlock signal.
which should trigger this update happening once, but it does not, it still re-renders on each state update to the app from the server.
What’s the code for sendAdjustments()?
You should update locally first, otherwise DnD pulls it back to its original position while you wait for backend to finish. This makes it appear glitchy. E.g:
Set the newly reordered list locally as your state
Send network request
If it fails, reverse local list state back to the original list
I have a isolated sync server that pulls a tab limited text file from a external ftp server and updates(saves) to mongodb after processing.
My code looks like this
//this function pulls file from external ftp server
async function upsteamFile() {
try {
let pythonProcess = spawn('python3', [configVar.ftpInbound, '/outbound/Items.txt', configVar.dataFiles.items], {encoding: 'utf8'});
logger.info('FTP SERVER LOGS...' + '\n' + pythonProcess.stdout);
await readItemFile();
logger.info('The process of file is done');
process.exit();
} catch (upstreamError) {
logger.error(upstreamError);
process.exit();
}
}
//this function connects to db and calls processing function for each row in the text file.
async function readItemFile(){
try{
logger.info('Reading Items File');
let dataArray = fs.readFileSync(configVar.dataFiles.items, 'utf8').toString().split('\n');
logger.info('No of Rows Read', dataArray.length);
await dbConnect.connectToDB(configVar.db);
logger.info('Connected to Database', configVar.db);
while (dataArray.length) {
await Promise.all( dataArray.splice(0, 5000).map(async (f) => {
splitValues = f.split('|');
await processItemsFile(splitValues)
})
)
logger.info("Current batch finished processing")
}
logger.info("ALL batch finished processing")
}
catch(PromiseError){
logger.error(PromiseError)
}
}
async function processItemsFile(splitValues) {
try {
// Processing of the file is done here and I am using 'save' in moongoose to write to db
// data is cleaned and assigned to respective fields
if(!exists){
let processedValues = new Products(assignedValues);
let productDetails = await processedValues.save();
}
return;
}
catch (error) {
throw error
}
}
upstream()
So this takes about 3 hours to process 100,000 thousand rows and update it in the database.
Is there any way that I can speed this up. I am very much limited from the hardware. I am using a ec2 instance based linux server with 2 core and 4 gb ram.
Should I use worker threads like microjob to run multi-threads . if yes , then how would I go about doing it
Or is this the maximum performance?
Note : I cant do bulk update in mongodb as there is mongoose pre hooks are getting triggered on save
You can always try a bulk update with the use of updateOne method.
I would consider also using the readFileStream instead of readFileSync.
With the event-driven architecture you could push, let's say every 100k updates into array chunks and bulk update on them simultaneously.
You can trigger a pre updateOne() (instead of save()) hook during this operation.
I have solved a similar problem (updating 100k CSV rows) with the following solution:
Create a readFileStream (thanks to that, your application won't consume much heap memory in case of the huge files)
I'm using CSV-parser npm library to deconstruct a CSV file into separate rows of data:
let updates = [];
fs.createReadStream('/filePath').pipe(csv())
.on('data', row => {
// ...do anything with the data
updates.push({
updateOne: {
filter: { /* here put the query */ },
update: [ /* any data you want to update */ ],
upsert: true /* in my case I want to create record if it does not exist */
}
})
})
.on('end', async () => {
await MyCollection.bulkWrite(data)
.catch(err => {
logger.error(err);
})
updates = []; // I just clean up the huge array
})
I'm trying to make a task scheduling program on node server.
and, the server will run on multiple computer.
so, I need to in memory database to share server state.
I wrote a code like below.
const { createClient } = require('redis');
const redis = createClient();
const notify = () => {
const runningTaskCount = redis.get('running_task_count');
if (runningTaskCount >= 10) return;
redis.incr('running_task_count');
const taskId = redis.lpop('task_id_queue');
const task = new Task(taskId);
task.run();
task.on('end', () => {
redis.decr('running_task_count');
notify();
});
};
const add = (taskId) => {
redis.rpush('task_id_queue', taskId);
notify();
};
if only one server running, no problem in this code.
but, multiple server running, running_task_count can be over 10.
So, I want to do like below
Lock 'running_task_count'
Get 'running_task_count'
if (running_task_count >= 10) {
Unlock 'running_task_count'
return
}
Incr 'running_task_count'
Unlock 'running_task_count'
(go on...)
How can I implement this?
Redis has the ability to run server-side logic via Lua scripts (see the EVAL command). Script execution is atomic, so race conditions are eliminated. A script like the following can do the job:
local val = redis.call('GET', KEYS[1])
if val >= 10 then
return
end
redis.call('INCR', KEYS[1])
you're looking for sth like redlock
// the string identifier for the resource you want to lock
var resource = 'running_task_count';
// the maximum amount of time you want the resource locked in milliseconds,
// keeping in mind that you can extend the lock up until
// the point when it expires
var ttl = 1000;
redlock.lock(resource, ttl).then(function(lock) {
// ...do something here...
redis.incr('running_task_count')
// unlock your resource when you are done
return lock.unlock()
.catch(function(err) {
// we weren't able to reach redis; your lock will eventually
// expire, but you probably want to log this error
console.error(err);
});
});
We're having an issue in our main data synchronization back-end function. Our client's mobile device is pushing changes daily, however last week they warned us some changes weren't updated in the main web app.
After some investigation in the logs, we found that there is indeed a single transaction that fails and rollback. However it appears that all the transactions before this one also rollback.
The code works this way. The data to synchronize is an array of "changesets", and each changset can update multiple tables at once. It's important that a changset be updated completely or not at all, so each is wrapped in a transaction. Then each transaction is executed one after the other. If a transaction fails, the others shouldn't be affected.
I suspect that all the transactions are actually combined somehow, possibly through the main db.task. Instead of just looping to execute the transactions, we're using a db.task to execute them in batch avoid update conflicts on the same tables.
Any advice how we could execute these transactions in batch and avoid this rollback issue?
Thanks, here's a snippet of the synchronization code:
// Begin task that will execute transactions one after the other
db.task(task => {
const transactions = [];
// Create a transaction for each changeset (propriete/fosse/inspection)
Object.values(data).forEach((change, index) => {
const logchange = { tx: index };
const c = {...change}; // Use a clone of the original change object
transactions.push(
task.tx(t => {
const queries = [];
// Propriete
if (Object.keys(c.propriete.params).length) {
const params = proprietes.parse(c.propriete.params);
const propriete = Object.assign({ idpropriete: c.propriete.id }, params);
logchange.propriete = { idpropriete: propriete.idpropriete };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM propriete WHERE idpropriete = $1`, propriete.idpropriete).then(previous => {
logchange.propriete.previous = previous;
return t.result('UPDATE propriete SET' + qutil.setequal(params) + 'WHERE idpropriete = ${idpropriete}', propriete).then(result => {
logchange.propriete.new = params;
})
}));
}
else delete c.propriete;
// Fosse
if (Object.keys(c.fosse.params).length) {
const params = fosses.parse(c.fosse.params);
const fosse = Object.assign({ idfosse: c.fosse.id }, params);
logchange.fosse = { idfosse: fosse.idfosse };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM fosse WHERE idfosse = $1`, fosse.idfosse).then(previous => {
logchange.fosse.previous = previous;
return t.result('UPDATE fosse SET' + qutil.setequal(params) + 'WHERE idfosse = ${idfosse}', fosse).then(result => {
logchange.fosse.new = params;
})
}));
}
else delete c.fosse;
// Inspection (rendezvous)
if (Object.keys(c.inspection.params).length) {
const params = rendezvous.parse(c.inspection.params);
const inspection = Object.assign({ idvisite: c.inspection.id }, params);
logchange.rendezvous = { idvisite: inspection.idvisite };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM rendezvous WHERE idvisite = $1`, inspection.idvisite).then(previous => {
logchange.rendezvous.previous = previous;
return t.result('UPDATE rendezvous SET' + qutil.setequal(params) + 'WHERE idvisite = ${idvisite}', inspection).then(result => {
logchange.rendezvous.new = params;
})
}));
}
else delete change.inspection;
// Cheminees
c.cheminees = Object.values(c.cheminees).filter(cheminee => Object.keys(cheminee.params).length);
if (c.cheminees.length) {
logchange.cheminees = [];
c.cheminees.forEach(cheminee => {
const params = cheminees.parse(cheminee.params);
const ch = Object.assign({ idcheminee: cheminee.id }, params);
const logcheminee = { idcheminee: ch.idcheminee };
queries.push(t.one(`SELECT ${Object.keys(params).join()} FROM cheminee WHERE idcheminee = $1`, ch.idcheminee).then(previous => {
logcheminee.previous = previous;
return t.result('UPDATE cheminee SET' + qutil.setequal(params) + 'WHERE idcheminee = ${idcheminee}', ch).then(result => {
logcheminee.new = params;
logchange.cheminees.push(logcheminee);
})
}));
});
}
else delete c.cheminees;
// Lock from further changes on the mobile device
// Note: this change will be sent back to the mobile in part 2 of the synchronization
queries.push(t.result('UPDATE rendezvous SET timesync = now() WHERE idvisite = $1', [c.idvisite]));
console.log(`transaction#${++transactionCount}`);
return t.batch(queries).then(result => { // Transaction complete
logdata.transactions.push(logchange);
});
})
.catch(function (err) { // Transaction failed for this changeset, rollback
logdata.errors.push({ error: err, change: change }); // Provide error message and original change object to mobile device
console.error(JSON.stringify(logdata.errors));
})
);
});
console.log(`Total transactions: ${transactions.length}`);
return task.batch(transactions).then(result => { // All transactions complete
// Log everything that was uploaded from the mobile device
log.log(res, JSON.stringify(logdata));
});
I apologize, this is almost impossible to make a final good answer when the question is wrong on too many levels...
It's important that a change set be updated completely or not at all, so each is wrapped in a transaction.
If the change set requires data integrity, the whole thing must be one transaction, and not a set of transactions.
Then each transaction is executed one after the other. If a transaction fails, the others shouldn't be affected.
Again, data integrity is what a single transaction guarantees, you need to make it into one transaction, not multiple.
I suspect that all the transactions are actually combined somehow, possibly through the main db.task.
They are combined, and not through task, but through method tx.
Any advice how we could execute these transactions in batch and avoid this rollback issue?
By joining them into a single transaction.
You would use a single tx call at the top, and that's it, no tasks needed there. And in case the code underneath makes use of its own transactions, you can update it to allow conditional transactions.
Also, when building complex transactions, an app benefits a lot from using the repository patterns shown in pg-promise-demo. You can have methods inside repositories that support conditional transactions.
And you should redo your code to avoid horrible things it does, like manual query formatting. For example, never use things like SELECT ${Object.keys(params).join()}, that's a recipe for disaster. Use the proper query formatting that pg-promise gives you, like SQL Names in this case.