Does the Spanner Client support bulk inserts? - google-cloud-spanner

Google's Spanner supports SQL "bulk" Inserts e.g. from doco
INSERT INTO Singers (SingerId, FirstName, LastName)
VALUES(1, 'Marc', 'Richards'),
(2, 'Catalina', 'Smith'),
(3, 'Alice', 'Trentor');
However I cannot find any support for this in the Go Client. The Go Client "Statement" type supports single-row inserts and I have used the BatchUpdate() function to execute a batch of single-row inserts, but I cannot find any support for bulk-inserts.
Does the Spanner Client support bulk inserts ?

Yes, there are a number of ways that you can do that:
The one mentioned by yourself: Use the BatchUpdate method to execute a collection of individual INSERT statements. An example can be found here.
You can execute an INSERT statement that inserts multiple rows by calling the Update method with a SQL string that inserts multiple rows. An example can be found here.
The most efficient way to insert a bulk of rows is to use mutations instead of DML. Use the Apply method to insert a collection of (insert) mutations. An example can be found here

Related

Update multiple columns and values node postgres

Using node-postgres I want to update columns in my user model, at present i have this
async function update_user_by_email(value, column, email){
const sql = format('UPDATE users SET %I = $1, WHERE email = $2', column);
await pool.query(sql, [value, email]);
}
So I can do this
await update_user_by_email(value, column_name, email_address);
However if I want to update multiple columns and values I am doing something very inefficient at the moment and calling that method x amount of times (i.e for each query)
await update_user_by_email(value, column_name, email_address);
await update_user_by_email(value_2, column_name_2, email_address);
await update_user_by_email(value_3, column_name_3, email_address);
How can I generate this with just one call to the database.
Thanks
You have a few options here:
node-postgres allows you to create queries based on prepared statements. (This builds on the native pg-sql prepared statements).
These are recommended by postgres for populating a table as a secondary option to using their copy command. You would end up doing more SQL statements (probably one per line), but the advantages of prepared statements are supposed to offset this somewhat.
You can also combine this with transactions, also mentioned in the postgres "populate" link above.
Another option is the approach taken by another library called pg-promise (specifically helpers). The pg-promise helpers library literally builds sql statements (as a string) for the bulk insert/update statements. That way you can have a single statement to update/insert thousands of rows at a time.
It's also possible (and relatively easy) to custom-build your own sql helpers, or to supplement pg-promise, by pulling structure data directly from information_schema tables and columns tables.
One of the more tedious things about pg-promise is having to give it all the column names (and sometimes definitions, default values, etc), and if you're working with dozens or hundreds of separate tables, auto-generating this info directly from the database itself is probably simpler and more robust (you don't have to update arrays of column names every time you change your database)
NOTE: You don't need to use pg-promise to submit queries generated by their helpers library. Personally, I like node-postgres better for actual db communications, and typically only use the pg-promise helpers library for building those bulk SQL statements.
NOTE2: It's worth noting that pg-promise wrote their own SQL injection protection (by escaping single-quotes in values and double-quotes in table/column names). The same would need to be done in the third option. Whereas the prepared statements are natively protected from sql injection by the database server itself.

knex.js multiple updates optmised

Right now the way I am doing my workflow is like this:
get a list of rows from a postgres database (let's say 10.000)
for each row I need to call an API endpoint and get a value, so 10.000 values returned from API
for each row that I have a value returned I need to update a field in the database. 10.000 rows updated
Right now I am doing a update after each API fetch but as you can imagine this isn't the most optimized way.
What other option do I have?
Probably bottleneck in that code is fetching the data from API. This trick only allows to send many small queries to DB faster without having to wait roundtrip time between each update.
To do multiple updates in single query you could use common table expressions and pack multiple small queries to single CTE query:
https://runkit.com/embed/uyx5f6vumxfy
knex
.with('firstUpdate', knex.raw('?', [knex('table').update({ colName: 'foo' }).where('id', 1)]))
.with('secondUpdate', knex.raw('?', [knex('table').update({ colName: 'bar' }).where('id', 2)]))
.select(1)
knex.raw trick there is a workaround, since .with(string, function) implementation has a bug.

PostgreSQL node.js prepared statements maximum bindings

I am trying to do some big bulk inserts to Postgres via node-postgres
When the bindings array exceeds 65536 values then passes to postgres the rest of values and when the query it runs I take the error
[error: bind message supplies 4 parameters, but prepared statement "" requires 65540]
Any thoughts?
Thanks in advance.
Prepared Statements within node-postgres are not suitable for bulk inserts, because they do not support multi-query statements. And you shouldn't stretch the array of variables across all inserts at the same time, this won't scale well, it has its own limits, like the one you hit there.
Instead, you should use multi-value inserts, in the format of:
INSERT INTO table(col1, col2, col3) VALUES
(val-1-1, val-1-2, val-1-3),
(val-2-1, val-2-2, val-2-3),
...etc
split your bulk inserts in queries like this, with up to 1,000 - 10,000 records, depending on the size of each record, and execute as a simple query.
See also Performance Boost article, to understand INSERT scalability better.

Insert multiple records at once in Cassandra

I've been researching a lot about how to insert multiple records in direct cassandra cqlsh console. I found something about batch, so I thought of using it with a loop (for, while) but it seems that Cassandra does not support batch.
How could insert multiple records in direct cassandra console? There is something like stored procedure in cassandra?
Cassandra does not (at this time) have stored procedures, but you should be able to accomplish this with a batch statement. Essentially you should be able to encapsulate multiple INSERTs inside of BEGIN BATCH and APPLY BATCH statements. This example is from the DataStax documentation on batch operations.
BEGIN BATCH
INSERT INTO purchases (user, balance) VALUES ('user1', -8) USING TIMESTAMP 19998889022757000;
INSERT INTO purchases (user, expense_id, amount, description, paid)
VALUES ('user1', 1, 8, 'burrito', false);
APPLY BATCH;
Check the doc linked above for more information.
Edit:
If you are meaning to INSERT several million records at once, then you should consider other methods. The cqlsh COPY command is a viable alternative (for a few million records or less) or the Cassandra Bulk Loader for 10 million or more.

Do you need to escape data inputs when inserting to Cassandra?

Ok, we all know that in traditional SQL databases you have to escape date when inserting to databases, so that their is no SQL injection. In Cassandra NoSQL databases, is their any problems like that? Do we need to escape any data before we insert into Cassandra? Any security related things I need to know?
An injection attack is much less of a concern with CQL for a number of reasons. For one, Cassandra will only execute one complete statement per query so any attack that, for example, attempted to concatenate a DROP, DELETE, or INSERT onto a SELECT, would fail. And, with the exception of BATCH (which requires a collection of complete INSERT, UPDATE, and DELETE statements), there are no nested queries.
That said, you should always sanitize your input, and you should make use of prepared statements, rather than constructing complete query statements in code.

Resources