Query parameters in LIKE Statement causes slow response - node.js

I'm trying to query Google spanner with query parameters using node.js client library.
However, response is very slower with query parameter than without query parameter.
Query has LIKE(forward match) statement. I couldn't find recommended way to use query parameters with LIKE statement.
Additionally, I tested with equal statement, there is no difference between query with parameter and query without parameter.
Table has more than 20 million rows. And instance is 1 node.
Is there any solution? or Is this bug with Google spanner?
Part of Schema(actually more than 40 columns):
CREATE TABLE props (
props__id STRING(MAX) NOT NULL,
props__address_quadkey STRING(MAX),
...
) PRIMARY KEY (props__id)
Index:
CREATE INDEX props__address_quadkey
ON props (
props__address_quadkey
)
Test code:
const Spanner = require('#google-cloud/spanner');
const spanner = new Spanner();
const db = spanner
.instance('instance_name')
.database('database_name');
(async () => {
// Make connection
await db.run({ sql: 'SELECT 1' });
console.time('Without param');
const r1 = (await db.run({
sql: `
SELECT
props__id
FROM props#{FORCE_INDEX=props__address_quadkey}
WHERE
(props__address_quadkey LIKE '1330020303011010200%')
`
}))[0];
console.log(r1.length); // 121
console.timeEnd('Without param'); // Without param: 277.223ms
console.time('with param 1');
const r2 = (await db.run({
sql: `
SELECT
props__id
FROM props#{FORCE_INDEX=props__address_quadkey}
WHERE
(props__address_quadkey LIKE #quadkey)
`,
params: { quadkey: '1330020303011010200%' },
types: { quadkey: 'string' },
}))[0];
console.log(r2.length); // 121
console.timeEnd('with param 1'); // with param 1: 9240.822ms
})();
Thank you for your help!

This is currently a limitation of Cloud Spanner. With a constant value for the LIKE pattern, Cloud Spanner is able to optimize the lookup expression based on the LIKE pattern during query compilation. For example, in this case, Cloud Spanner will be able to generate a query plan with a lookup expression that is basically
STARTS_WITH(props__address_quadkey, 1330020303011010200)
which will be able to efficiently search the index for entries that match the prefix in the LIKE pattern.
But with a parameterized LIKE pattern, that is not possible as the parameter is not evaluated until execution time and could contain any LIKE expression. As a result, instead of being able to efficiently lookup the matching rows, Cloud Spanner must read all rows and evaluate them against the LIKE pattern in the parameter to filter out non-matching rows.
This limitation however does not affect simpler predicates like the equality predicate where Cloud Spanner is able to do efficient lookups based on the value of the parameter.

Related

BigQuery NodeJS SDK not creating native GEOGRAPHY type

First, simple BigQuery SQL:
We're trying to take the following runnable BigQuery SQL query and convert it to a parameterized query to execute in Node.js:
SELECT * FROM UNNEST([
STRUCT(
ST_GEOGFROMTEXT('POINT(1 2)') AS lnglat,
TIMESTAMP('2020-01-01') AS stamp
)
])
The query simply builds a pseudo-table from an array of STRUCTs. Most notably, the output types match what you'd expect, the stamp column is a BigQuery TIMESTAMP type, and lnglat is a BigQuery GEOGRAPHY type.
Now, let's try in Node.js.
Let's substitute the array of BigQuery STRUCTs above, for #points, and pass a JavaScript array of Object literals as params:
// this is version 5.3.0
const { BigQuery, Geography } = require('#google-cloud/bigquery');
const bigquery = new BigQuery();
(async () => {
const query = 'SELECT * from UNNEST(#points)';
const params = { points: [
{
lnglat: new Geography('POINT(1 2)'),
stamp: BigQuery.timestamp('2020-01-01')
}
] };
const [job] = await bigquery.createQueryJob({ query, params });
// Wait for the query to finish
const [rows] = await job.getQueryResults();
// Print the results
console.log('Rows:');
console.log(rows);
})();
Returns the following result on my CLI:
> node index.js
Rows:
[
{
lnglat: { value: 'POINT(1 2)' },
stamp: BigQueryTimestamp { value: '2020-01-01T00:00:00.000Z' }
}
]
The problem is, despite the NodeJS SDK containing docs around "Geography" here, here, and here, none of these methods seem to actually force BigQuery to construct a native BigQuery GEOGRAPHY type inside of BigQuery.
It seems, instead, BigQuery will interpret the new Geography() as a RECORD type, with a value field as indicated in the response above, and also verified by inspecting the temporary (anon) table that is created in the BigQuery UI:
We've tried different variants of geography functions/classes: Geography, BigQuery.Geography, and bigquery.Geography; they all return the same RECORD type.
Strangely, if we instead query an existing table (as opposed to constructing a pseudo-table at runtime), the result is more consistent with what I would expect:
Rows:
[ { lnglat: Geography { value: 'POINT(-118.43356046 45.97057312)' } } ]
Note the Geography type in the response!
We are aware that we can fallback to specifying lnglat as a JavaScript string literal, and the following SQL will convert it into a native GEOGRAPHY by wrapping in a CTE:
WITH points AS (
SELECT * from UNNEST(#points)
)
SELECT * EXCEPT(lnglat), ST_GEOGFROMTEXT(lnglat) AS lnglat FROM points
But unfortunately, we want to use this pseudotable as a filter against a much larger on-disk table, and using this CTE-wrapper eliminates the ability for that query (not illustrated here) to leverage clustering. Clustering is very important for cost savings and execution performance. I can elaborate more on this if you request.
At the end of the day, it still doesn't explain why native GEOGRAPHYs are not materializing in the pseudo-table.
Question:
How do we use BigQuery NodeJS SDK to construct a native BigQuery GEOGRAPHY type, similar to what we can do with BigQuery.timestamp() (above), without CTEs?

How to: Sequential db.batch with pg-promise

I cannot figure out how to execute a batch call of generated queries sequentially.
I am trying to truncate every table in the DB. My code:
db.any(`
SELECT table_name
FROM information_schema.tables
WHERE table_schema='public'
AND table_type='BASE TABLE';
`)
.then(res => res.map(item => item.table_name)) // To get only an array with the names
.then(tables => tables.map(tableName => db.none(`TRUNCATE TABLE ${tableName} CASCADE`))) // ES6 template strings, because the table name must be bare here (no quotes)
.then(queries => db.tx(t => t.batch(queries)))
I get deadlock detected errors. It's clear why I am getting deadlocks: The queries cascade and try to truncate the same table as another query. That's why I need to call the queries synchronously. I can't figure out the way to do it. I tried using db.sequence(), but I was getting the same errors. What is the proper way of sequential execution of generated queries with pg-promise? Thanks a lot.
Syntax supported by pg-promise is very flexible. Below is just one such syntax, which is the easiest to use for your case, and the most modern one:
await db.tx(async t => {
const tables = await t.map(`
SELECT table_name
FROM information_schema.tables
WHERE table_schema = $1
AND table_type = $2
`, ['public', 'BASE TABLE'], a => a.table_name);
for (let i = 0; i < tables.length; i++) {
await t.none('TRUNCATE TABLE $1:name CASCADE', tables[i]);
}
});
// ES6 template strings, because the table name must be bare here (no quotes)
that is wrong, names must be in double quotes, which we provide with SQL Names filter.
Also see from here:
Never use the reserved ${} syntax inside ES6 template strings, as those have no knowledge of how to format values for PostgreSQL.

Log specific postgresql query using pg-promise

I am using pg-promise package with Nodejs to execute PostgreSQL queries. I want to see the queries executed. Only specific queries, say, just one query that I want to debug.
I can see that one recommended way is to use the pg-monitor to catch the events and log them as mentioned here in the examples documentation.
Without using pg-monitor, is there a simple way to just print the prepared query that is executed. I can't see it in the docs.
Example:
db.query("SELECT * FROM table WHERE id = $/id/", {id: 2})
How to print this query to yield?
SELECT * FROM table WHERE id = 2
is there a simple way to just print the prepared query that is executed...
A query in general - yes, see below. A Prepared Query - no, those are by definition formatted on the server-side.
const query = pgp.as.format('SELECT * FROM table WHERE id = $/id/', {id: 2});
console.log(query);
await db.any(query);
And if you want to print all queries executed by your module, without using pg-monitor, simply add event query handler when initializing the library:
const initOptions = {
query(e) {
console.log(e.query);
}
};
const pgp = require('pg-promise')(initOptions);

Node-Postgres SELECT WHERE IN dynamic query optimization

We're working on a Node/Express web app with a Postgres database, using the node-postgres package. We followed the instructions in this question, and have our query working written this way:
exports.getByFileNameAndColName = function query(data, cb) {
const values = data.columns.map(function map(item, index) {
return '$' + (index + 2);
});
const params = [];
params.push(data.fileName);
data.columns.forEach(function iterate(element) {
params.push(element);
});
db.query('SELECT * FROM columns ' +
'INNER JOIN files ON columns.files_id = files.fid ' +
'WHERE files.file_name = $1 AND columns.col_name IN (' + values.join(', ') + ')',
params, cb
);
};
data is an object containing a string fileName and an array of column names columns.
We want this query to extract information from our 'columns' and 'files' tables from a dynamic number of columns.
db.query takes as parameters (query, args, cb), where query is the SQL query, args is an array of parameters to pass into the query, and cb is the callback function executed with the database results.
So the code written in this way returns the correct data, but (we think) it's ugly. We've tried different ways of passing the parameters into the query, but this is the only format that has successfully returned data.
Is there a cleaner/simpler way to pass in our parameters? (e.g. any way to pass parameters in a way the node-postgres will accept without having to create an additional array from my array + non-array elements.)
Asking this because:
perhaps there's a better way to use the node-postgres package/we're using it incorrectly, and
if this is the correct way to solve this type of issue, then this code supplements the answer in the question referenced above.
Hello I tried to translate "but (we think) it's ugly" I believe my response answers your question.
In that same question you reference you will find this response
In which the user takes the pg-promise with special-case variable formatting
In your case it may look something like this using shared connection but in your example I would actually recommend using a plain db.query Im just using the shared connection to show you how i extended the "ugly":
exports.getByFileNameAndColName = function query(data,cb) {
var sco;
const params = [];
params.push(data.fileName);
data.columns.forEach(function iterate(element) {
params.push(element);
});
db.connect()
.then(function(obj){
sco=obj;
return sco.query('SELECT * FROM columns ' +
'INNER JOIN files ON columns.files_id = files.fid ' +
'WHERE files.file_name = $1 AND columns.col_name IN ($2^)',
pgp.as.csv(params)));
},function(reason){
console.log(reason);
})
.done(function(){
if(sco){
sco.done();
cb();
}
});
};
Now again I'm not sure what you meant by ugly but in my use case the return format was something like this:
{
column:[
{
id: data,
data: data,
col_name: data,
files_id: data,
fid: data,
files_name: data
},...
]
}
And in my case I really wanted this:
{
column:[
{
id: data,
data: data,
col_name: data,
files_id: data,
},...
],
file:[
{
fid: data,
files_name: data
},...
]
}
So in order to do that I took the same shared connection and added a extra variable to manage the results. Now this may not answer your question or I just might be on to something but I suggest looking into pg-promises it could be helpful for advance queries and formatting.
My question was asking if there was a way to use the node-postgres library in way that cleaned up our params creation code before the query. However, from the several deleted answers as well as the remaining one, it seems like we're being ornery and those few extra lines aren't that big of a deal and that this is the best way to write this code. So, I'm marking this question "answered," although now it appears that it wasn't the greatest question and perhaps we shouldn't have asked it in the first place.

How to properly escape raw SQL query (plainto_tsquery) in Postgres / node

I'm writing a raw SQL query to implement Postgres full text search in my node backend. I've looked through the official docs, which state:
plainto_tsquery transforms unformatted text querytext to tsquery. The text is parsed and normalized much as for to_tsvector, then the & (AND) Boolean operator is inserted between surviving words.
but I'm not familiar enough with all the different SQL injection techniques to know for certain whether the following will be properly escaped:
'SELECT * FROM "Products" WHERE "catalog_ts_vector" ## plainto_tsquery(\'english\', ' + search_term + ')'
The user will be able to enter whatever search_term they want via the URI.
Do I need to do further escaping/manipulation, or is this functionality fully baked into plainto_tsquery() and other Postgres safeguards?
Edit
As a side note, I plan to strip out most non-alphanumeric characters (including parentheses) with .replace(/[^\w-_ .\&]|\(\)/g, ' '); that should go a long way, but I'm still curious if this is even necessary.
Most likely you're using pg module as PostgreSQL client for node.js. In this case you don't need to worry about sql injection, pg prevents it for you. Just not use string concatination to create query, use parameterized queries (or prepared statement):
var sql = 'SELECT * FROM "Products" WHERE "catalog_ts_vector" ## plainto_tsquery(\'english\', $1)';
var params = [search_term];
client.query(sql, params, function(err, result) {
// handle error and result here
});
Also look at Prepared Statment part of pg wiki and PostgreSQL PREPARE statement.
UPD What about sequelize - it uses pg module by default, but you can specify you preferable pg client in dialectModulePath config parameter (see here). Also you can use parameterized queries in sequelize too. Even better - you can use named parameters. So you code will be:
var sql = 'SELECT * FROM "Products" WHERE "catalog_ts_vector" ## plainto_tsquery(\'english\', :search_term)';
var params = { search_term: search_term }
sequelize.query(sql, Product, null, params).then(function(products) {
// handle your products here
})
Where Product is your sequelize product model.

Resources