How to: Sequential db.batch with pg-promise - node.js

I cannot figure out how to execute a batch call of generated queries sequentially.
I am trying to truncate every table in the DB. My code:
db.any(`
SELECT table_name
FROM information_schema.tables
WHERE table_schema='public'
AND table_type='BASE TABLE';
`)
.then(res => res.map(item => item.table_name)) // To get only an array with the names
.then(tables => tables.map(tableName => db.none(`TRUNCATE TABLE ${tableName} CASCADE`))) // ES6 template strings, because the table name must be bare here (no quotes)
.then(queries => db.tx(t => t.batch(queries)))
I get deadlock detected errors. It's clear why I am getting deadlocks: The queries cascade and try to truncate the same table as another query. That's why I need to call the queries synchronously. I can't figure out the way to do it. I tried using db.sequence(), but I was getting the same errors. What is the proper way of sequential execution of generated queries with pg-promise? Thanks a lot.

Syntax supported by pg-promise is very flexible. Below is just one such syntax, which is the easiest to use for your case, and the most modern one:
await db.tx(async t => {
const tables = await t.map(`
SELECT table_name
FROM information_schema.tables
WHERE table_schema = $1
AND table_type = $2
`, ['public', 'BASE TABLE'], a => a.table_name);
for (let i = 0; i < tables.length; i++) {
await t.none('TRUNCATE TABLE $1:name CASCADE', tables[i]);
}
});
// ES6 template strings, because the table name must be bare here (no quotes)
that is wrong, names must be in double quotes, which we provide with SQL Names filter.
Also see from here:
Never use the reserved ${} syntax inside ES6 template strings, as those have no knowledge of how to format values for PostgreSQL.

Related

Query parameters in LIKE Statement causes slow response

I'm trying to query Google spanner with query parameters using node.js client library.
However, response is very slower with query parameter than without query parameter.
Query has LIKE(forward match) statement. I couldn't find recommended way to use query parameters with LIKE statement.
Additionally, I tested with equal statement, there is no difference between query with parameter and query without parameter.
Table has more than 20 million rows. And instance is 1 node.
Is there any solution? or Is this bug with Google spanner?
Part of Schema(actually more than 40 columns):
CREATE TABLE props (
props__id STRING(MAX) NOT NULL,
props__address_quadkey STRING(MAX),
...
) PRIMARY KEY (props__id)
Index:
CREATE INDEX props__address_quadkey
ON props (
props__address_quadkey
)
Test code:
const Spanner = require('#google-cloud/spanner');
const spanner = new Spanner();
const db = spanner
.instance('instance_name')
.database('database_name');
(async () => {
// Make connection
await db.run({ sql: 'SELECT 1' });
console.time('Without param');
const r1 = (await db.run({
sql: `
SELECT
props__id
FROM props#{FORCE_INDEX=props__address_quadkey}
WHERE
(props__address_quadkey LIKE '1330020303011010200%')
`
}))[0];
console.log(r1.length); // 121
console.timeEnd('Without param'); // Without param: 277.223ms
console.time('with param 1');
const r2 = (await db.run({
sql: `
SELECT
props__id
FROM props#{FORCE_INDEX=props__address_quadkey}
WHERE
(props__address_quadkey LIKE #quadkey)
`,
params: { quadkey: '1330020303011010200%' },
types: { quadkey: 'string' },
}))[0];
console.log(r2.length); // 121
console.timeEnd('with param 1'); // with param 1: 9240.822ms
})();
Thank you for your help!
This is currently a limitation of Cloud Spanner. With a constant value for the LIKE pattern, Cloud Spanner is able to optimize the lookup expression based on the LIKE pattern during query compilation. For example, in this case, Cloud Spanner will be able to generate a query plan with a lookup expression that is basically
STARTS_WITH(props__address_quadkey, 1330020303011010200)
which will be able to efficiently search the index for entries that match the prefix in the LIKE pattern.
But with a parameterized LIKE pattern, that is not possible as the parameter is not evaluated until execution time and could contain any LIKE expression. As a result, instead of being able to efficiently lookup the matching rows, Cloud Spanner must read all rows and evaluate them against the LIKE pattern in the parameter to filter out non-matching rows.
This limitation however does not affect simpler predicates like the equality predicate where Cloud Spanner is able to do efficient lookups based on the value of the parameter.

Log specific postgresql query using pg-promise

I am using pg-promise package with Nodejs to execute PostgreSQL queries. I want to see the queries executed. Only specific queries, say, just one query that I want to debug.
I can see that one recommended way is to use the pg-monitor to catch the events and log them as mentioned here in the examples documentation.
Without using pg-monitor, is there a simple way to just print the prepared query that is executed. I can't see it in the docs.
Example:
db.query("SELECT * FROM table WHERE id = $/id/", {id: 2})
How to print this query to yield?
SELECT * FROM table WHERE id = 2
is there a simple way to just print the prepared query that is executed...
A query in general - yes, see below. A Prepared Query - no, those are by definition formatted on the server-side.
const query = pgp.as.format('SELECT * FROM table WHERE id = $/id/', {id: 2});
console.log(query);
await db.any(query);
And if you want to print all queries executed by your module, without using pg-monitor, simply add event query handler when initializing the library:
const initOptions = {
query(e) {
console.log(e.query);
}
};
const pgp = require('pg-promise')(initOptions);

Massive inserts with pg-promise

I'm using pg-promise and I want to make multiple inserts to one table. I've seen some solutions like Multi-row insert with pg-promise and How do I properly insert multiple rows into PG with node-postgres?, and I could use pgp.helpers.concat in order to concatenate multiple selects.
But now, I need to insert a lot of measurements in a table, with more than 10,000 records, and in https://github.com/vitaly-t/pg-promise/wiki/Performance-Boost says:
"How many records you can concatenate like this - depends on the size of the records, but I would never go over 10,000 records with this approach. So if you have to insert many more records, you would want to split them into such concatenated batches and then execute them one by one."
I read all the article but I can't figure it out how to "split" my inserts into batches and then execute them one by one.
Thanks!
UPDATE
Best is to read the following article: Data Imports.
As the author of pg-promise I was compelled to finally provide the right answer to the question, as the one published earlier didn't really do it justice.
In order to insert massive/infinite number of records, your approach should be based on method sequence, that's available within tasks and transactions.
var cs = new pgp.helpers.ColumnSet(['col_a', 'col_b'], {table: 'tableName'});
// returns a promise with the next array of data objects,
// while there is data, or an empty array when no more data left
function getData(index) {
if (/*still have data for the index*/) {
// - resolve with the next array of data
} else {
// - resolve with an empty array, if no more data left
// - reject, if something went wrong
}
}
function source(index) {
var t = this;
return getData(index)
.then(data => {
if (data.length) {
// while there is still data, insert the next bunch:
var insert = pgp.helpers.insert(data, cs);
return t.none(insert);
}
// returning nothing/undefined ends the sequence
});
}
db.tx(t => t.sequence(source))
.then(data => {
// success
})
.catch(error => {
// error
});
This is the best approach to inserting massive number of rows into the database, from both performance point of view and load throttling.
All you have to do is implement your function getData according to the logic of your app, i.e. where your large data is coming from, based on the index of the sequence, to return some 1,000 - 10,000 objects at a time, depending on the size of objects and data availability.
See also some API examples:
spex -> sequence
Linked and Detached Sequencing
Streaming and Paging
Related question: node-postgres with massive amount of queries.
And in cases where you need to acquire generated id-s of all the inserted records, you would change the two lines as follows:
// return t.none(insert);
return t.map(insert + 'RETURNING id', [], a => +a.id);
and
// db.tx(t => t.sequence(source))
db.tx(t => t.sequence(source, {track: true}))
just be careful, as keeping too many record id-s in memory can create an overload.
I think the naive approach would work.
Try to split your data into multiple pieces of 10,000 records or less.
I would try splitting the array using the solution from this post.
Then, multi-row insert each array with pg-promise and execute them one by one in a transaction.
Edit : Thanks to #vitaly-t for the wonderful library and for improving my answer.
Also don't forget to wrap your queries in a transaction, or else it
will deplete the connections.
To do this, use the batch function from pg-promise to resolve all queries asynchronously :
// split your array here to get splittedData
int i = 0
var cs = new pgp.helpers.ColumnSet(['col_a', 'col_b'], {table: 'tmp'})
// values = [..,[{col_a: 'a1', col_b: 'b1'}, {col_a: 'a2', col_b: 'b2'}]]
let queries = []
for (var i = 0; i < splittedData.length; i++) {
var query = pgp.helpers.insert(splittedData[i], cs)
queries.push(query)
}
db.tx(function () {
this.batch(queries)
})
.then(function (data) {
// all record inserted successfully !
}
.catch(function (error) {
// error;
});

Inserting multiple records with pg-promise

I have a scenario in which I need to insert multiple records. I have a table structure like id (it's fk from other table), key(char), value(char). The input which needs to be saved would be array of above data. example:
I have some array objects like:
lst = [];
obj = {};
obj.id= 123;
obj.key = 'somekey';
obj.value = '1234';
lst.push(obj);
obj = {};
obj.id= 123;
obj.key = 'somekey1';
obj.value = '12345';
lst.push(obj);
In MS SQL, I would have created TVP and passed it. I don't know how to achieve in postgres.
So now what I want to do is save all the items from the list in single query in postgres sql, using pg-promise library. I'm not able to find any documentation / understand from documentation. Any help appreciated. Thanks.
I am the author of pg-promise.
There are two ways to insert multiple records. The first, and most typical way is via a transaction, to make sure all records are inserted correctly, or none of them.
With pg-promise it is done in the following way:
db.tx(t => {
const queries = lst.map(l => {
return t.none('INSERT INTO table(id, key, value) VALUES(${id}, ${key}, ${value})', l);
});
return t.batch(queries);
})
.then(data => {
// SUCCESS
// data = array of null-s
})
.catch(error => {
// ERROR
});
You initiate a transaction with method tx, then create all INSERT query promises, and then resolve them all as a batch.
The second approach is by concatenating all insert values into a single INSERT query, which I explain in detail in Performance Boost. See also: Multi-row insert with pg-promise.
For more examples see Tasks and Transactions.
Addition
It is worth pointing out that in most cases we do not insert a record id, rather have it generated automatically. Sometimes we want to get the new id-s back, and in other cases we don't care.
The examples above resolve with an array of null-s, because batch resolves with an array of individual results, and method none resolves with null, according to its API.
Let's assume that we want to generate the new id-s, and that we want to get them all back. To accomplish this we would change the code to the following:
db.tx(t => {
const queries = lst.map(l => {
return t.one('INSERT INTO table(key, value) VALUES(${key}, ${value}) RETURNING id',
l, a => +a.id);
});
return t.batch(queries);
})
.then(data => {
// SUCCESS
// data = array of new id-s;
})
.catch(error => {
// ERROR
});
i.e. the changes are:
we do not insert the id values
we replace method none with one, to get one row/object from each insert
we append RETURNING id to the query to get the value
we add a => +a.id to do the automatic row transformation. See also pg-promise returns integers as strings to understand what that + is for.
UPDATE-1
For a high-performance approach via a single INSERT query see Multi-row insert with pg-promise.
UPDATE-2
A must-read article: Data Imports.

Node-Postgres SELECT WHERE IN dynamic query optimization

We're working on a Node/Express web app with a Postgres database, using the node-postgres package. We followed the instructions in this question, and have our query working written this way:
exports.getByFileNameAndColName = function query(data, cb) {
const values = data.columns.map(function map(item, index) {
return '$' + (index + 2);
});
const params = [];
params.push(data.fileName);
data.columns.forEach(function iterate(element) {
params.push(element);
});
db.query('SELECT * FROM columns ' +
'INNER JOIN files ON columns.files_id = files.fid ' +
'WHERE files.file_name = $1 AND columns.col_name IN (' + values.join(', ') + ')',
params, cb
);
};
data is an object containing a string fileName and an array of column names columns.
We want this query to extract information from our 'columns' and 'files' tables from a dynamic number of columns.
db.query takes as parameters (query, args, cb), where query is the SQL query, args is an array of parameters to pass into the query, and cb is the callback function executed with the database results.
So the code written in this way returns the correct data, but (we think) it's ugly. We've tried different ways of passing the parameters into the query, but this is the only format that has successfully returned data.
Is there a cleaner/simpler way to pass in our parameters? (e.g. any way to pass parameters in a way the node-postgres will accept without having to create an additional array from my array + non-array elements.)
Asking this because:
perhaps there's a better way to use the node-postgres package/we're using it incorrectly, and
if this is the correct way to solve this type of issue, then this code supplements the answer in the question referenced above.
Hello I tried to translate "but (we think) it's ugly" I believe my response answers your question.
In that same question you reference you will find this response
In which the user takes the pg-promise with special-case variable formatting
In your case it may look something like this using shared connection but in your example I would actually recommend using a plain db.query Im just using the shared connection to show you how i extended the "ugly":
exports.getByFileNameAndColName = function query(data,cb) {
var sco;
const params = [];
params.push(data.fileName);
data.columns.forEach(function iterate(element) {
params.push(element);
});
db.connect()
.then(function(obj){
sco=obj;
return sco.query('SELECT * FROM columns ' +
'INNER JOIN files ON columns.files_id = files.fid ' +
'WHERE files.file_name = $1 AND columns.col_name IN ($2^)',
pgp.as.csv(params)));
},function(reason){
console.log(reason);
})
.done(function(){
if(sco){
sco.done();
cb();
}
});
};
Now again I'm not sure what you meant by ugly but in my use case the return format was something like this:
{
column:[
{
id: data,
data: data,
col_name: data,
files_id: data,
fid: data,
files_name: data
},...
]
}
And in my case I really wanted this:
{
column:[
{
id: data,
data: data,
col_name: data,
files_id: data,
},...
],
file:[
{
fid: data,
files_name: data
},...
]
}
So in order to do that I took the same shared connection and added a extra variable to manage the results. Now this may not answer your question or I just might be on to something but I suggest looking into pg-promises it could be helpful for advance queries and formatting.
My question was asking if there was a way to use the node-postgres library in way that cleaned up our params creation code before the query. However, from the several deleted answers as well as the remaining one, it seems like we're being ornery and those few extra lines aren't that big of a deal and that this is the best way to write this code. So, I'm marking this question "answered," although now it appears that it wasn't the greatest question and perhaps we shouldn't have asked it in the first place.

Resources