Node Postgres COPY FROM failing silently - node.js

I am trying to use PostgreSQL's COPY FROM API to stream potentially-thousands of records into a database as they are dynamically generated in node.js code. To do so, I wrote this generic wrapper function:
function streamRows(client, { table, columns, data }) {
return new Promise((resolve, reject) => {
const sqlStream = client.query(
copyFrom(`COPY ${ table } (${ columns.join(', ') }) FROM STDIN`));
const rowStream = new Readable();
rowStream.pipe(sqlStream)
.on('finish', resolve)
.on('error', reject);
for (const row of data) {
rowStream.push(`${ row.join('\t') }\n`);
}
rowStream.push('\\.\n');
rowStream.push(null);
});
}
The database table I'm writing into looks like this:
CREATE TABLE devices (
id SERIAL PRIMARY KEY,
group_id INTEGER REFERENCES groups(id),
serial_number CHAR(12) NOT NULL,
status INTEGER NOT NULL
);
And I am calling it as follows:
function *genRows(id, devices) {
let count = 0;
for (const serial of devices) {
yield [ id, serial, UNSTARTED ];
count++;
if (count % 10 === 0) log.info(`Streamed ${ count } rows...`);
}
log.info(`Streamed ${ count } rows.`);
}
await streamRows(client, {
table: 'devices',
columns: [ 'group_id', 'serial_number', 'status' ],
data: genRows(id, devices),
});
The log statements in my generator function that's producing the per-row data all run as expected, and the output indicates that it is in fact always running the generator to completion, and streaming all the data rows I want. No errors are ever thrown. But if I wait for it to complete, the table sometimes ends up with 0 rows added to it--i.e., it looks like I sent all that data to Postgres, but none of it was actually inserted. What am I doing wrong?

I do not know exactly what parts of this made the difference and what is purely stylistic, but after playing around with a bunch of different examples from across the web, I managed to cobble together this function which works:
function streamRows(client, { table, columns, data }) {
return new Promise((resolve, reject) => {
const iterator = data[Symbol.iterator]();
const rs = new Readable();
const ws = client.query(copyFrom(`COPY ${ table } (${ columns.join(', ') }) FROM STDIN`));
rs._read = function() {
const { value, done } = iterator.next();
rs.push(done ? null : `${ value.join('\t') }\n`);
};
rs.on('error', reject);
ws.on('error', reject);
ws.on('end', resolve);
rs.pipe(ws);
});
}

Related

NodeJS organize loop with repeated records

I have a table where I store the sales of certain products. I need to fetch the records by client id.
I did like this:
public async getData(clientID: any): Promise<any> {
try {
return await client
.scan({
TableName: "dbSales",
FilterExpression: "contains(clientID, :clientID)",
ExpressionAttributeValues: {
":clientID": clientID,
},
})
.promise()
.then(async (response) => {
let data = [];
for (let i = 0; i < response.Count; i++) {
if(data.filter(product => product.productId == response.Items[i]['productId']) != undefined){
const resultData = await this.getProduct(response.Items[i]['productId']);
data.push(resultData);
}
}
return {
status: 200,
data: data,
};
})
.catch((error: AxiosError) => {
throw error;
});
} catch (e) {
throw new HttpError(500, e.message);
}
}
There it will get all the records with that customer id, and then through a condition it will get the product name of each record it found, through another function (getProduct), with the repetition loop.
The problem is that I'm getting a lot of repeated results, i.e. for the same product id and customer.
I need that when there is more than one record in dbSales with the same customer id and with the same product id, it returns me only one result for those records.
That is, one row for each record that contains the same customer ID and product ID.
This is generating a long delay in the search for data, as I only need which products a particular customer has purchased, without the information being repeated when he has more purchases of the same product.
Problem solved! Solution:
The solution presented by our friend #jarmod solved my problem, in a simple way.
let result = response.Items.filter((e, i) => {
return response.Items.findIndex((x) => {
return x.productId == e.productId}) == i;
});
for (let i = 0; i < result.length; i++) {
if(data.filter(product => product.productId == result[i]['productId']) != undefined){
const resultData = await this.getProduct(result[i]['productId']);
data.push(resultData);
}
}
Proposed solution: How to remove duplicates objects in array based on 2 properties?

BatchWrite in AWS dynamo db skipping some items

I am trying to write Items to AWS dynamo db using node SDK. The problem I am facing is that when I write batch items to AWS in parallel using threads, some of the items are not written to database. The number of items are written are random. For instance, If I run my code 3 times, at one time it would be 150, next it would 200 and third time it could be 135. In addition, when I write the items sequentially without threads, even then some of the items are not written.However, in this case the items are less missing. For instance if the total number of items is 300 then the items written are 298. I investigated the problem to see if there any unprocessed items but the batchWrite method returns nothing. It means that all the items are being processed correctly. Please note that I have OnDemand provision for my respective database so I do not expect any throttling issues. So here is my code.
exports.run = async function() {
**This is the function which runs first !!!!!**
const data = await getArrayOfObjects();
console.log("TOTAL PRICE CHANGES")
console.log(data.length)
const batchesOfData = makeBatches(data)
const threads = new Set();
console.log("**********")
console.log(batchesOfData.length)
console.log("**********")
for(let i = 0; i < batchesOfData.length; i++) {
console.log("BATCH!!!!!")
console.log(i)
console.log(batchesOfData[i].length)
// Sequential Approach
const response = await compensationHelper.createItems(batchesOfData[i])
console.log("RESPONSE")
console.log(response)
Parallel approach
// const workerResult = await runService(batchesOfData[i])
// console.log("WORKER RESUULT!!!!")
// console.log(workerResult);
}
}
exports.updateItemsInBatch = async function(data, tableName) {
console.log("WRITING DATA")
console.log(data.length)
const batchItems = {
RequestItems: {},
};
batchItems.RequestItems[tableName] = data;
try {
const result = await documentClient.batchWrite(batchItems).promise();
console.log("UNPROCESSED ITEMS")
console.log(result)
if (result instanceof Error) {
console.log(`[Error]: ${JSON.stringify(Error)}`);
throw new Error(result);
}
return Promise.resolve(true);
} catch (err) {
console.error(`[Error]: ${JSON.stringify(err.message)}`);
return Promise.reject(new Error(err));
}
};
exports.convertToAWSCompatibleFormat = function(data) {
const awsCompatibleData = [];
data.forEach(record => awsCompatibleData.push({ PutRequest: { Item: record } }));
return awsCompatibleData;
};
const createItems = async function(itemList) {
try {
const objectsList = [];
for (let index = 0; index < itemList.length; index++) {
try {
const itemListObj = itemList[index];
const ObjToBeInserted = {
// some data assignments here
};
objectsList.push(ObjToBeInserted);
if (
objectsList.length >= AWS_BATCH_SIZE ||
index === itemList.length - 1
) {
const awsCompatiableFormat = convertToAWSCompatibleFormat(
objectsList
);
await updateItemsInBatch(
awsCompatiableFormat,
process.env.myTableName
);
}
} catch (error) {
console.log(`[Error]: ${JSON.stringify(error)}`);
}
}
return Promise.resolve(true);
} catch (err) {
return Promise.reject(new Error(err));
}
};
const makeBatches = products => {
const productBatches = [];
let countr = -1;
for (let index = 0; index < products.length; index++) {
if (index % AWS_BATCH_SIZE === 0) {
countr++;
productBatches[countr] = [];
if (countr === MAX_BATCHES) {
break;
}
}
try {
productBatches[countr].push(products[index]);
} catch (error) {
continue;
}
}
return productBatches;
};
async function runService(workerData) {
return new Promise((resolve, reject) => {
const worker = new Worker(path.join(__dirname, './worker.js'), { workerData });
worker.on('message', resolve);
worker.on('error', reject);
worker.on('exit', (code) => {
if (code !== 0)
reject(new Error(`Worker stopped with exit code ${code}`));
})
})
}
// My worker file
'use strict';
const { workerData, parentPort } = require('worker_threads')
const creatItems = require('myscripts')
// You can do any heavy stuff here, in a synchronous way
// without blocking the "main thread"
console.log("I AM A NEW THREAD")
createItems(workerData)
// console.log('Going to write tons of content on file '+workerData);
parentPort.postMessage({ fileName: workerData, status: 'Done' })
From boto3 documentation:
If one or more of the following is true, DynamoDB rejects the entire batch write operation:
One or more tables specified in the BatchWriteItem request does not exist.
Primary key attributes specified on an item in the request do not match those in the corresponding table's primary key schema.
You try to perform multiple operations on the same item in the same BatchWriteItem request. For example, you cannot put and delete the same item in the same BatchWriteItem request.
Your request contains at least two items with identical hash and range keys (which essentially is two put operations).
There are more than 25 requests in the batch.
Any individual item in a batch exceeds 400 KB.
The total request size exceeds 16 MB.
To me, it looks some of this is true. At my job, we also had a problem that one batch contained 2 identical primary and secondary keys in the batch so the whole batch was discarded. I know it's not node.js, but we used this to overcome that problem.
It is batch_writer(overwrite_by_pkeys) and it is used to overwrite the last occurance of the same primary and last key in the batch. If only a small portion of your data is duplicate data and you do not need to save it, you can use this. BUT if you need to save all your data, I do not advise you to use this functionality.
I don't see where you are checking the response for UnprocessedItems. Batch operations will often return a list of items it didn't process. As is documented, BatchWriteItem "can write up to 16 MB of data, which can comprise as many as 25 put or delete requests."
I had duplicate keys issue which means that primary and the sort key had duplicate values in the batch, however, in my case this error was not returned from the AWS BatchWrite method if my timestamp was in fraction of seconds 2020-02-09T08:02:36.71, which was a bit surprising. I resolved the issue by making my createdAt(sort key) to be more granular like this => 2020-02-09T08:02:36.7187 Thus making it non-repetitive.

Create promises on queries inside a for loop

I'm trying to write a Node.js code that does the below.
Connect to a Salesforce instance.
Get the past 7 days, and loop through them.
Run 2 queries inside them and push the result to an Array.
Display the value in another function.
Here is my JS code.
var jsforce = require("jsforce");
var moment = require('moment');
function connectToEP() {
var main_Obj = {};
var response_Obj = {};
var pastSevenDaysArray = [];
var conn = new jsforce.Connection();
var beforeSevenDays = moment().subtract(7, 'days').format('YYYY-MM-DD');
var today = moment().startOf('day');
var i = 0;
conn.login("myUid", "myPwd").then(() => {
console.log("Connected To Dashboard");
for (var m = moment(beforeSevenDays); m.diff(today, 'days') <= 0; m.add(1, 'days')) {
conn.query("SELECT SUM(Total_ETA_of_all_tasks__c), SUM(Total_ETA__C) from Daily_Update__c where DAY_ONLY(createddate)= " + m.format('YYYY-MM-DD')).then(() => {
console.log("B1");
var z = response_Obj.aggrRes;
response_Obj.aggrRes = res;
pastSevenDaysArray.push({ z: res });
console.log("B1 Exit");
}).then(() => {
conn.query("SELECT count(Id), Task_Type__c FROM Daily_Task__c where DAY_ONLY(createddate) = " + m.format('YYYY-MM-DD') + " group by Task_Type__c").then(() => {
console.log("B2");
var z = response_Obj.aggrRes;
response_Obj.aggrRes = res;
pastSevenDaysArray.push({ z: res });
console.log("B2 Exit");
})
})
}
return Promise.resolve(pastSevenDaysArray);
}).then((data) => {
console.log(typeof data);
updateMessage(JSON.stringify(data));
console.log(typeof data);
});
}
function updateMessage(message) {
console.log("XXXXXXXXXXXX");
console.log(message);
console.log("XXXXXXXXXXXX");
}
function socketNotificationReceived() {
console.log("socket salesforce rec");
connectToEP();
}
socketNotificationReceived();
when I run this code, the output that I get is.
socket salesforce rec
Connected To Dashboard
object
XXXXXXXXXXXX
[]
XXXXXXXXXXXX
object
B1
B1
B1
B1
B1
B1
B1
B1
I'm very new to this js platform, unable to get the promises concepts :(. please let me know on were am I going wrong and how can I fix it.
An explanation of what's going is very helpful in my future projects.
Thanks
The thing I always do when I get confused is to decompose. Build the pieces one by one, and make sure each works. Trying to understand your code, I get something like this...
A function each for logging in, getting a "task sum" from the db and getting a "task count" from the db. (Task sum/count is what I guessed the queries were up to. Rename as you see fit).
var jsforce = require("jsforce");
var moment = require('moment');
function login(conn) {
return conn.login("myUid", "myPwd");
}
function queryTaskSumForDay(conn, m) {
return conn.query("SELECT SUM(Total_ETA_of_all_tasks__c), SUM(Total_ETA__C) from Daily_Update__c where DAY_ONLY(createddate)= " + m.format('YYYY-MM-DD'));
}
function queryTaskCountForDay(conn, m) {
return conn.query("SELECT count(Id), Task_Type__c FROM Daily_Task__c where DAY_ONLY(createddate) = " + m.format('YYYY-MM-DD') + " group by Task_Type__c");
}
With those working, it should be easy to get a sum and a count for a given day. Rather than returning these in an array (containing two objects that each have a "z" property as your code did), I opted for the simpler single object that has a sum and count property. You may need to change this to suit your design. Notice the use of Promise.all() to resolve two promises together...
function sumAndCountForDay(conn, m) {
let sum = queryTaskSumForDay(conn, m);
let count = queryTaskCountForDay(conn, m);
return Promise.all([sum, count]).then(results => {
return { sum: results[0], count: results[1] };
});
}
With that working, it should be easy to get an array of sum-count objects for a period of seven days using your moment logic and the Promise.all() idea...
function sumAndCountForPriorWeek(conn) {
let promises = [];
let beforeSevenDays = moment().subtract(7, 'days').format('YYYY-MM-DD');
let today = moment().startOf('day');
for (let m = moment(beforeSevenDays); m.diff(today, 'days') <= 0; m.add(1, 'days')) {
promises.push(sumAndCountForDay(conn, m));
}
return Promise.all(promises);
}
With that working (notice the pattern here?), your OP function is tiny and nearly fully tested because we tested all of it's parts...
function connectToEP() {
let conn = new jsforce.Connection();
return login(conn).then(() => {
return sumAndCountForPriorWeek(conn)
}).then(result => {
console.log(JSON.stringify(result));
return result;
}).catch(error => {
console.log('error: ' + JSON.stringify(error));
return error;
});
}
I think your general structure should be something like this. The biggest issue is not returning promises when you need to. A "for loop" of promises is a little difficult to step into, but if you can do them in parallel then the easiest thing to do is Promise.all If you need to aggregate the data before you can perform the next query then you need to do multiple Promise.all().then()'s. The reason you get an empty array [] is because your for loop creates the promises but doesn't wait until they finish.
var jsforce = require("jsforce");
var moment = require('moment');
function connectToEP() {
// connectToEP now returns a promise
return conn.login("myUid", "myPwd").then(() => {
console.log("Connected To Dashboard");
let myQueries = [];
for (start ; condition ; incrementer) {
myQueries.push( // Add all these query promises to the parallel queue
conn.query(someQuery)
.then((res) => {
return res;
})
.then((res) => {
return conn.query(someQuery).then((res) => {
return someData;
})
})
)
}
return Promise.all(myQueries); // Waits for all queries to finish...
}).then((allData) => { // allData is an array of all the promise results
return updateMessage(JSON.stringify(allData));
});
}

How to bulk insert in Mongoose cursor.eachAsync?

I have tried several solutions to get this working but all failed. I am reading the Mongo DB docs using cursor.eachAsync() and converting some doc fields. I need to move these docs to another collection after conversion.My idea is that after 1000 docs are processed, they should be bulk-inserted into the destination collection. This works good until the last batch of records which are less than 1000. To phrase the same problem differently, if the number of records are <1000 then they are not inserted.
1. First version - bulk insert after async()
Just like any other code, I should have docs < 1000 in bulk object after async() and should be able to insert. But I find bulk.length is 0. (I have removed those statements in the code snippet below).
```js`async function run() {
await mongoose.connect(dbPath, dbOptions);
const cursor = events.streamEvents(query, 10);
let successCounter = 0;
let bulkBatchSize = 1000;
let bulkSizeCounter = 0;
let sourceDocsCount = 80;
var bulk = eventsConvertedModel.collection.initializeOrderedBulkOp();
await cursor.eachAsync((doc) => {
let pPan = new Promise((resolve, reject) => {
getTokenSwap(doc.panTokenIdentifier, doc._id)
.then((swap) => {
resolve(swap);
});
});
let pXml = new Promise((resolve, reject) => {
let xmlObject;
getXmlObject(doc)
.then(getXmlObjectToken)
.then((newXmlString) => {
resolve(newXmlString);
})
.catch(errFromPromise1 => {
});
})
.catch(error => {
reject(error);
});
});
Promise.all([pPan, pXml])
.then(([panSwap, xml]) => {
doc.panTokenIdentifier = panSwap;
doc.eventRecordTokenText = xml;
return doc;
})
.then((newDoc) => {
successCounter++;
bulkSizeCounter++;
bulk.insert(newDoc);
if (bulkSizeCounter % bulkBatchSize == 0) {
bulk.execute()
.then(result => {
bulkSizeCounter = 0;
let msg = "Conversion- bulk insert =" + result.nInserted;
console.log(msg);
bulk = eventsConvertedModel.collection.initializeOrderedBulkOp();
Promise.resolve();
})
.catch(bulkErr => {
logger.error(bulkErr);
});
}
else {
Promise.resolve();
}
})
.catch(err => {
console.log(err);
});
});
console.log("outside-async=" + bulk.length); // always 0
console.log("run()- Docs converted in this run =" + successCounter);
process.exit(0);
}`
2. Second version (track expected number of iterations and after all iterations, change batch size to say 10).
Result - The batch size value changes but it's not reflected in bulk.insert. The records are lost.
3. Same as 2nd but insert one record at a time after bulk inserts are done.
```js
let d = eventsConvertedModel(newDoc);
d.isNew = true;
d._id = mongoose.Types.ObjectId();
d.save().then(saved => {
console.log(saved._id)
Promise.resolve();
}).catch(saveFailed => {
console.log(saveFailed);
Promise.resolve();
});
```
Result - I was getting DocumentNotFound error, so I added d.isNew = true. But for some reason only few records get inserted and many of them get lost.
I have also tried other variations using the number of expected bulk insert iterations. Finally, I changed the code to write to file (one doc at a time) but I am still wondering if there is any way to make write to DB make work.
Dependencies:
Node v8.0.0
Mongoose 5.2.2

how to promises in a loop?

I am trying write a cron function in nodejs which fetches user_ids of all the users from the db and then I want to parse through each user_id.
Here is my code :
cron.schedule('43 11 * * *', function(){
var now = moment()
var formatted = now.format('YYYY-MM-DD HH:mm:ss')
console.log('Starting the cron boss!');
var dbSelectPromise = function(db, sql1) {
return new Promise(function(resolve, reject) {
db.select(sql1, function(err, data) {
if (err) {
reject(err)
} else {
resolve(data)
}
})
})
}
var users =[]
var sql = "select distinct(user_id) from user_level_task"
dbSelectPromise(db,sql).then(function(secondResult){
for(i=0;i<secondResult.length;i++){
var sql1 = "select max(level_id) as level from user_level_task where user_id ="+secondResult[i].user_id
dbSelectPromise(db,sql1).then(function(thirdResult){
console.log(thirdResult)
console.log(current)
var sql2 = "select task_id form user_level_task where user_id = '"+secondResult[i].user_id+"' and level_id = '"+thirdResult[0].level+"' "
dbSelectPromise(db,sql2).then(function(fourthResult){
var leng = fourthResult.length
for(i=0;i<leng;i++){
console.log(fourthResult[i])
}
})
})
}
})
});
The problem i am facing is i cannot access value of i in third and fourth promises. Please help!
I think what's happening is that i is no longer the same when you create those new promises because the for loop is still running. It appears that what you really need is the user_id and level_id. I suggest you restructure your code a bit to reduce nesting and pass on the values you need for future promises.
Perhaps something similar to this:
dbSelectPromise(db, sql)
.then(secondResult => {
const levelPromises = [];
secondResult.forEach(res => {
levelPromises.push(getLevelByUserId(res.user_id, db));
});
return Promise.all(levelPromises); // Promise.all only if you want to handle all success cases
})
.then(result => {
result.forEach( level => {
{ userId, queryResult } = level;
// ...
})
//...
})
.catch(err => {
console.log(err);
});
function getLevelByUserId(userId, db) {
const query = `select max(level_id) as level from user_level_task where user_id = ${userId}`;
return dbselectPromise(db, query).then(result => { userId, result });
}
It creates an array of all the get level queries as promises and then passes it along to the next step using Promise.all() which will only resolve if all queries were successful. At that point, you will have access to the userId again of each result because we returned it in our new function for your next set of queries.
I think you should abstract your queries a bit further instead of using a generic dbSelectPromise and don't forget to catch() at the end otherwise you won't know what's happening.
Note: It assumes your db variable instantiated properly and your original db.select doesn't need to be returned based on whatever library you're using. There's also some new syntax there.
The problem i am facing is i cannot access value of i in third and fourth promises. Please help!
This is because you're using reinitializing i without using let. When the loop is in process, the value will be different than what you expect.
each promise is dependant on the other and need to run synchronously
For this to work, You need to chain promises. Also, you can make use of Promise.all() to execute a bunches of promises at once. Remember, Promise.all() is all or nothing.
Making those changes to your code, I get the following structure.
'use strict';
let _ = require('lodash');
function dbSelectPromise(db, sql1) {
return new Promise((resolve, reject) => {
return db.select(sql1, (err, data) => {
if (err) {
return reject(err);
}
return resolve(data);
});
});
}
function process(secondResult) {
let sql1 = "select max(level_id) as level from user_level_task where user_id =" + secondResult[i].user_id;
return dbSelectPromise(db, sql1).then(function (thirdResult) {
console.log(thirdResult);
let sql2 = "select task_id form user_level_task where user_id = '" + secondResult[i].user_id + "' and level_id = '" + thirdResult[0].level + "' ";
return dbSelectPromise(db, sql2);
});
}
function getUsers() {
let sql = "select distinct(user_id) from user_level_task";
return dbSelectPromise(db, sql).then((users) => {
return users;
}).catch(() => {
return [];
});
}
cron.schedule('43 11 * * *', function () {
var now = moment();
var formatted = now.format('YYYY-MM-DD HH:mm:ss');
getUsers().then((users) => {
let batches = _.map(users, function (user) {
return process(user);
});
return Promise.all(batches);
}).then((fourthResult) => {
console.log('Your fourthResult [[],..]', fourthResult);
}).catch(() => {
console.log('err while processing', err);
});
});

Resources