Using node-postgres I want to update columns in my user model, at present i have this
async function update_user_by_email(value, column, email){
const sql = format('UPDATE users SET %I = $1, WHERE email = $2', column);
await pool.query(sql, [value, email]);
}
So I can do this
await update_user_by_email(value, column_name, email_address);
However if I want to update multiple columns and values I am doing something very inefficient at the moment and calling that method x amount of times (i.e for each query)
await update_user_by_email(value, column_name, email_address);
await update_user_by_email(value_2, column_name_2, email_address);
await update_user_by_email(value_3, column_name_3, email_address);
How can I generate this with just one call to the database.
Thanks
You have a few options here:
node-postgres allows you to create queries based on prepared statements. (This builds on the native pg-sql prepared statements).
These are recommended by postgres for populating a table as a secondary option to using their copy command. You would end up doing more SQL statements (probably one per line), but the advantages of prepared statements are supposed to offset this somewhat.
You can also combine this with transactions, also mentioned in the postgres "populate" link above.
Another option is the approach taken by another library called pg-promise (specifically helpers). The pg-promise helpers library literally builds sql statements (as a string) for the bulk insert/update statements. That way you can have a single statement to update/insert thousands of rows at a time.
It's also possible (and relatively easy) to custom-build your own sql helpers, or to supplement pg-promise, by pulling structure data directly from information_schema tables and columns tables.
One of the more tedious things about pg-promise is having to give it all the column names (and sometimes definitions, default values, etc), and if you're working with dozens or hundreds of separate tables, auto-generating this info directly from the database itself is probably simpler and more robust (you don't have to update arrays of column names every time you change your database)
NOTE: You don't need to use pg-promise to submit queries generated by their helpers library. Personally, I like node-postgres better for actual db communications, and typically only use the pg-promise helpers library for building those bulk SQL statements.
NOTE2: It's worth noting that pg-promise wrote their own SQL injection protection (by escaping single-quotes in values and double-quotes in table/column names). The same would need to be done in the third option. Whereas the prepared statements are natively protected from sql injection by the database server itself.
Related
I currently have a table in my Postgres database with about 115k rows that I feel is too slow for my serverless functions. The only thing I need that table for is to lookup values using functions like ILIKE and the network barrier is slowing things down a lot I believe.
My thought was to take the table and make it into a javascript array of objects as it doesn't change often if ever. Now that I have it in a file such as array.ts and inside is:
export default [
{}, {}, {},...
]
What is the best way to query this huge array? Is it best to just use the .filter function? I currently am trying to import the array and filter it but it seems to just hang and never actually complete. MUCH slower the the current DB approach so I am unsure if this is the right approach.
Make the database faster
As people have commented, it's likely that the database will actually perform better than anything else given that databases are good at indexing large data sets. It may just be a case of adding the right index, or changing the way your serverless functions handle the connection pool.
Make local files faster
If you want to do it without the database, there are a couple of things that will make a big difference:
Read the file and then use JSON.parse, do not use require(...)
JavaScript is much slower to parse than JSON. You can therefore make things load much faster by parsing it as JavaScript.
Find a way to split up the data
Especially in a serverless environment, you're unlikely to need all the data for every request, and the serverless function will probably only serve a few requests before it is shutdown and a new one is started.
If you could split your files up such that you typically only need to load an array of 1,000 or so items, things will run much faster.
Depending on the size of the objects, you might consider having a file that contains only the id of the objects & the fields needed to filter them, then having a separate file for each object so you can load the full object after you have filtered.
Use a local database
If the issue is genuinely the network latency, and you can't find a good way to split up the files, you could try using a local database engine.
#databases/sqlite can be used to query an SQLite database file that you could pre-populate with your array of values and index appropriately.
const openDatabase = require('#databases/sqlite');
const {sql} = require('#databases/sqlite');
const db = openDatabase('mydata.db');
async function query(pattern) {
await db.query(sql`SELECT * FROM items WHERE item_name LIKE ${pattern}`);
}
query('%foo%').then(results => console.log(results));
I'm using TypeORM with MS SQL Server.
In TypeORM default set up SQL queries, generated by .insert and .update methods are compiling in parametrized queries in SQL.
Is there a way to switch to inlining of data instead of parametrization?
P.S. I know about possibility of SQL injections in this case, but:
my data is validated before being persisted in my code and
from tests (we operate with big data sets (5m record with 1 column - integer, 10K records with 30 columns of different data types) that needs to be inserted or based on them, existing rows should be updated) - insert without parametrization works much faster.
You can use this style of inserts:
await getConnection()
.createQueryBuilder()
.insert()
.into(User)
.values({
firstName: "Timber",
lastName: () => "CONCAT('S', 'A', 'W')"
})
.execute();
And as you are aware, you need to escape anything inserted that way to protect against SQL injection.
Right now the way I am doing my workflow is like this:
get a list of rows from a postgres database (let's say 10.000)
for each row I need to call an API endpoint and get a value, so 10.000 values returned from API
for each row that I have a value returned I need to update a field in the database. 10.000 rows updated
Right now I am doing a update after each API fetch but as you can imagine this isn't the most optimized way.
What other option do I have?
Probably bottleneck in that code is fetching the data from API. This trick only allows to send many small queries to DB faster without having to wait roundtrip time between each update.
To do multiple updates in single query you could use common table expressions and pack multiple small queries to single CTE query:
https://runkit.com/embed/uyx5f6vumxfy
knex
.with('firstUpdate', knex.raw('?', [knex('table').update({ colName: 'foo' }).where('id', 1)]))
.with('secondUpdate', knex.raw('?', [knex('table').update({ colName: 'bar' }).where('id', 2)]))
.select(1)
knex.raw trick there is a workaround, since .with(string, function) implementation has a bug.
I am working on ASP.NET Web Forms project and I use jquery datatable to visualize data fetched from SQL server. I need to pass the results for the current page and the total number of results for which by far I have this code :
var queryResult = query.Select(p => new[] { p.Id.ToString(),
p.Name,
p.Weight.ToString(),
p.Address })
.Skip(iDisplayStart)
.Take(iDisplayLength).ToArray();
and the result that I get when I return the result to the view like :
iTotalRecords = queryResult.Count(),
is the number of records that the user has chosen to see per page. Logical, but I haven't thought about it while building my Method chaining. Now I think about the optimal way to implement this. Since it's likely to use with relatively big amounts of data (10,000 rows, maybe more) I would like leave as much work as I can to the SQL server. However I found several questions asked about this, and the impression that I get is that I have to make two queries to the database, or manipulate the total result in my code. But I think this will won't be efficient when you have to work with many records.
So what can I do here to get best performance?
In regards to what you’re looking for I don’t think there is a simple answer.
I believe the only way you can currently do this is by running more than one query like you have already suggested, whether this would be encapsulated inside a stored procedure (SPROC) call or generated by EF.
However, I believe you can make optimsations to make your query run quicker.
First of all, every query execution MAY result in the query being recached as you are chaining your methods together, this means that the actual query being executed will need to be recompiled and cached by SQL Server (if that is your chosen technology) before being executed. This normally only takes a few milliseconds, but if the query being executed only takes a few milliseconds then this is relatively expensive.
Entity framework will translate this Linq query and execute it using derived tables. With a small result set of approx. 1k records to be paged your current solution maybe best suited. This would also depend upon on how complex your SQL filtering is as generated by your method chaining.
If your result set to be paged grows up towards 15k, I would suggest writing a SPROC to get the best performance and scalability which would insert the records into a temp table and run two queries against it, firstly to get the paged records, and secondly to get the total results.
alter proc dbo.usp_GetPagedResults
#Skip int = 10,
#Take int = 10
as
begin
select
row_number() over (order by id) [RowNumber],
t.Name,
t.Weight,
t.Address
into
#results
from
dbo.MyTable t
declare #To int = #Skip+#Take-1
select * from #results where RowNumber between #Skip and #To
select max(RowNumber) from #results
end
go
You can use the EF to map a SPROC call to entity types or create a new custom type containing the results and the number of results.
Stored Procedures with Multiple Results
I found that the cost of running the above SPROC was approximately a third of running the query which EF generated to get the same result based upon the result set size of 15k records. It was however three times slower than the EF query if only a 1K record result set due to the temp table creation.
Encapsulating this logic inside a SPROC allows the query to be refactored and optimised as your result set grows without having to change any application based code.
The suggested solution doesn’t use the derived table queries as created by the Entity Framework inside a SPROC as I found there was a marginal performance difference between running the SPROC and the query directly.
Ok, we all know that in traditional SQL databases you have to escape date when inserting to databases, so that their is no SQL injection. In Cassandra NoSQL databases, is their any problems like that? Do we need to escape any data before we insert into Cassandra? Any security related things I need to know?
An injection attack is much less of a concern with CQL for a number of reasons. For one, Cassandra will only execute one complete statement per query so any attack that, for example, attempted to concatenate a DROP, DELETE, or INSERT onto a SELECT, would fail. And, with the exception of BATCH (which requires a collection of complete INSERT, UPDATE, and DELETE statements), there are no nested queries.
That said, you should always sanitize your input, and you should make use of prepared statements, rather than constructing complete query statements in code.