Knex + SQL Server whereIn query 8-12s -- raw version returns NO results but if I input the .toQuery() result directly I get results - node.js

The database is in Azure cloud and not being used in production currently. There are 80.000 rows and a uprn is a VARCHAR(100);
I'm already using JOI to validate each UPRN as well;
I'm using KNEX with a SQL Server database with the following whereIn query:
knex(LOCATIONS.table).whereIn(LOCATIONS.uprn, req.body.uprns)
but this takes 8-12s to complete and sometimes timesout. if I use .toQuery() on the same thing, SSMS will return the result within 1-2.
If I do a raw query, the resulting .toQuery() or toString() works in SSMS and returns results. But if I try to use the raw directly, it will return 0 results.
I'm looking to either fix what's making whereIn so slow or get the raw query working.
EDIT 1:
After much debugging and trying -- it seems that the bug is due to how knex deals with arrays, so I made a for-of loop to add ? ? ? for each array element and then inputed the array for all params.
This led me to realizing the performance issue is due to SQL server way of parameterising.
I ended up building a raw query string with all of the parameters and validating the input with Joi string/regex config:
Joi.string()
.min(1)
.max(35)
.regex(/^[a-z\d\-_\s]+$/i)
allowing only for alphanumeric, dashes and spaces which should prevent sql injection.
I'm going to look deeper into security issues with this and might make a separate login that can only SELECT data from that table and nothing more to run with these queries.

Needed to just handle it raw and validate separately.

Related

AND occasionally produces wrong result on shell and node native driver

I’ve built a dynamic query generator to create my desired queries based on many factors, however, in rare cases it acted weird. After a day on reading logs I found a situation that can be simplified in this:
db.users.find({att: ‘a’, att: ‘b’})
What I expect is that mongodb by default uses AND, so above query’s result should be an empty array. However, it’s not!
But when I use AND explicitly, the result is an empty array
db.users.find({$and: [{att: 'a'}, {att: ‘b'}]})
In javascript object’s keys should be unique, otherwise the value will be replaced by the latest value
(also mongodb shell is based on js, so it follows some rules of js)
const t = {att: 'a', att: 'b'};
console.log(t);
So in your case your query is acting like this:
db.users.find({att: ‘b’})
You’ve to handle this situation on your code if you want the result be empty in the mentioned condition

Node-Postgres/Knex returning CITEXT[] as a string in JS, instead of an array of strings

I'm using Knex, which itself uses package "pg" (aka "node-postgres").
If you SELECT some rows from a table with a TEXT[] column, all is well... in JS you get an array of strings.
But if you're using a CITEXT[] column, instead you just get back a string in JS like:
"{First-element,Second-element}"
Normally when you want to instruct the pg package on how to return specific postgres types, you can do something like this:
import {types} from 'pg';
types.setTypeParser(types.builtins.TIMESTAMPTZ, 'text');
types.setTypeParser(types.builtins.TIMESTAMP, 'text');
types.setTypeParser(types.builtins.DATE, 'text');
types.setTypeParser(types.builtins.TIME, 'text');
types.setTypeParser(types.builtins.TIMETZ, 'text');
The types.builtins.* constants have values that are hardcoded OID numbers for the known built-in types in postgres. Those OID numbers are the same across all postgres installations.
However due to the fact that CITEXT[] is an extension, the OID numbers for the CITEXT + CITEXT[] types will be different on every server, e.g. with the following SQL query:
SELECT typname, oid, typarray FROM pg_type WHERE typname like '%citext%';
On my development server I get:
typname|oid |typarray|
-------|-----|--------|
citext |17459|17464 |
_citext|17464|0 |
But on my production server I get:
typname|oid |typarray|
-------|-----|--------|
citext |18618|18623 |
_citext|18623|0 |
How can I solve this?
Some hacky options that I really don't want to do are:
Find out all the different OID values for all my servers and hard code them in - very hacky and really don't want to do this.
Write code specifically for every table/column that manually converts the strings to array - also hacky and repetitive
When the node process initializes, get the server's OID value for the server and then call the types.setTypeParser() function with that dynamic value - also not very good
How can I solve this for all tables/columns without these hacky solutions?
I don't believe there is any way how to do it without querying DB.
I would probably query the correct OID number before starting the node app and store it to an environment variable and then initialize pg types with the value from process.env.
That is also a bit hacky, but at least the hack is mostly encapsulated out of the application code.

Multi insert inside a QueryFile

I'm able to generate query for multi inserts or update thanks to pg-promise helpers but I was wondering if I could follow the advice of the author and put all queries outside of my javascript code (See here https://github.com/vitaly-t/pg-promise/wiki/SQL-Files and here : https://github.com/vitaly-t/pg-promise-demo).
When I use the insert helpers, the return query looks like :
INSERT INTO "education"("candidate_id","title","content","degree","school_name","start_date","still_in","end_date","picture_url") VALUES('6','My degree','Business bachelor','Bachelor +','USC','2018-05-15T02:00:00.000+02:00'::date,false,null::date,null),('6','Another degree','Engineering','Master degree','City University','2018-05-15T02:00:00.000+02:00'::date,false,null::date,null)
The idea is that I don't know how many inserts I want to do at the same time, so it has to be dynamic.
The following code doesn't work as I'm passing an array of object instead of an object :
db.none(`INSERT INTO "education"("candidate_id","title","content","degree","school_name","start_date","still_in","end_date","picture_url")
VALUES($<candidate_id>, $<title>, $<content>, $<degree>, $<school_name>, $<start_date>, $<still_in>, $<end_date>, $<picture_url>)`, data)
This code spreads the object but is still not correct to make a proper query :
db.none(`INSERT INTO "education"("candidate_id","title","content","degree","school_name","start_date","still_in","end_date","picture_url")
VALUES($1:list)`,
[data])
Any idea ? Is it at least possible or in the case where I don't know how many records I want to insert in advance I have to call pgp.helpers everytime ?
You confuse static and dynamic SQL. SQL files are there for SQL queries that are mainly static, i.e. you still can inject dynamically a lot, but when most of the query is dynamic, there is no longer any point putting it into an SQL file.
And the helpers namespace is there for dynamic queries only. So you are asking about two separate things, to join things that do not need to be joined.

KnexJS giving different response

So I have a NodeJS+KnexJS setup on a PostgreSQL DB, and am using the .whereRaw() method so I can use a CASE statement in my WHERE clause.
The query was tested in my CLI before migrating to code. Here is the code that is being used.
var qry = knex.select(....); // ignore the select, not important.
qry.with('daspecs', function(qy) {
qy.select('spec_id').from('drawings').where('uid', query.d);
}).whereRaw('CASE WHEN (select "spec_id" from "daspecs") IS NULL THEN true ELSE c.spec_id = (select "spec_id" from "daspecs") END');
The SQL that KnexJS is generating (output using qry.toString()) is correct, and I can even copy and paste this to my psql CLI and it returns the results I want (12 records), but for some wierd reason the KnexJS query seems to return a completely different set of results (1106 records).
Not sure where to go next, since KnexJS is giving me the right SQL, but seems like it's executing something else, and not sure how else to diagnose what it is actually doing (I've tried the knex.on('query'...) event).
Any alteration on the final SQL would result in an error (i've tested), so at the point of ruling out missing pieces.
Has anyone had any experience or issues with KnexJS saying one thing, but doing another, in particular, with the whereRaw command?

Resource Conflict after syncing with PouchDB

I am new to CouchDB / PouchDB and until now I somehow could manage the start of it all. I am using the couchdb-python library to send initial values to my CouchDB before I start the development of the actual application. Here I have one database with templates of the data I want to include and the actual database of all the data I will use in the application.
couch = couchdb.Server()
templates = couch['templates']
couch.delete('data')
data = couch.create('data')
In Python I have a loop in which I send one value after another to CouchDB:
value = templates['Template01']
value.update({ '_id' : 'Some ID' })
value.update({'Other Attribute': 'Some Value'})
...
data.save(value)
It was working fine the whole time, I needed to run this several times as my data had to be adjusted. After I was satisfied with the results I started to create my application in Javascript. Now I synced PouchDB with the data database and it was also working. However, I found out that I needed to change something in the Python code, so I ran the first python script again, but now I get this error:
couchdb.http.ResourceConflict: (u'conflict', u'Document update conflict.')
I tried to destroy() the pouchDB database data and delete the CouchDB database as well. But I still get this error at this part of the code:
data.save(value)
What I also don't understand is, that a few values are actually passed to the database before this error comes. So some values are saved() into the db.
I read it has something to do with the _rev values of the documents, but I cannot get an answer. Hope someone can help here.

Resources