node.js and postgres bulk upsert or another pattern? - node.js

I am using Postgres, NodeJS and Knex.
I have the following situation:
A database table with a unique field.
In NodeJS I have an array of objects and I need to:
a. Insert a new row, if the table does not contain the unique id, or
b. Update the remaining fields, if the table does contain the unique id.
From my knowledge I have three options:
Do a query to check for each if exists in database and based on the response, do a update or insert. This costs resources because there's a call for each array item and also a insert or update.
Delete all rows that have id in array and then perform a insert. This would mean only 2 operations but the autoincrement field will keep on growing.
Perform an upsert since Postgres 9.5 supports it. Bulk upsert seems to work and there's only a call to database.
Looking through the options I am aware of, upsert seems the most reasonable one but does it have any drawbacks?

Upsert is a common way.
Another way is use separate insert/update operations and most likely it will be faster:
Define existing rows
select id from t where id in (object-ids) (*)
Update existing row by (*) result
Filter array by (*) and bulk insert new rows.
See more details for same question here

Related

How to REPLACE INTO using MERGE INTO for upsert in Delta Lake?

The recommended way of doing an upsert in a delta table is the following.
MERGE INTO users
USING updates
ON users.userId = updates.userId
WHEN MATCHED THEN
UPDATE SET address = updates.addresses
WHEN NOT MATCHED THEN
INSERT (userId, address) VALUES (updates.userId, updates.address)
Here updates is a table. My question is how can we do an upsert directly, that is, without using a source table. I would like to give the values myself directly.
In SQLite, we could simply do the following.
REPLACE INTO table(column_list)
VALUES(value_list);
Is there a simple way to do that for Delta tables?
A source table can be a subquery so the following should give you what you're after.
MERGE INTO events
USING (VALUES(...)) // round brackets are required to denote a subquery
ON false // an artificial merge condition
WHEN NOT MATCHED THEN INSERT *

Cassandra : Using output of one query as input to another query

I have two tables one is users and other is expired_users.
users columns-> id, name, age
expired_users columns -> id, name
I want to execute the following query.
delete from users where id in (select id from expired_users);
This query works fine with SQL related databases. I want find a solution to solve this in cassandra.
PS: I don't want to add any extra columns in the tables.
While designing cassandra data model, we cannot think exactly like RDBMS .
Design like this --
create table users (
id int,
name text,
age int,
expired boolean static,
primary key (id,name)
);
To mark a user as expired -- Just insert the same row again
insert into users (id,name,age,expired) values (100,'xyz',80,true);
you don't have to update or delete the row, just insert it again, previous column values will get overridden.
What you want to is to use join as a filter for your delete statement, and this is not what the Cassandra model is built for.
AFAIK there is no way to perform this using cql. If you want to perform this action without changing the schema - run external script in any language that has drivers for Cassandra.

Insert docs in sorted order on mongodb

In mongodb, I want to insert the data in sorted order based on some field.
The way I am doing, before insertion compare the data with data which is in collection and then insert it on that particular position. Is the insertion at particular position is possible in mongodb using node.js
You can't insert a doc at a specific spot in the collection. Even if you could, it wouldn't matter because you can't rely on the natural order of MongoDB documents staying consistent as they can move over time as the docs in a collection are updated.
Instead, create an index on the field(s) you need your docs sorted on and then include a sort clause in your queries to efficiently retrieve the docs in that order.
Example in the shell:
// Create the index (do this once)
db.test.ensureIndex({someField: 1})
// Sorted query
db.test.find().sort({someField: 1})

Select TTL for an element in a map in Cassandra

Is there any way to select TTL value for an element in a map in Cassandra with CQL3?
I've tried this, but it doesn't work:
SELECT TTL (mapname['element']) FROM columnfamily
Sadly, I'm pretty sure the answer is that it is not possible as of Cassandra 1.2 and CQL3. You can't query individual elements of a collection. As this blog entry says, "You can only retrieve a collection in its entirety". I'd really love to have the capability to query for collection elements, too, though.
You can still set the TTL for individual elements in a collection. I suppose if you wanted to be assured that a TTL is some value for your collection elements, you could read the entire collection and then update the collection (the entire thing or just a chosen few elements) with your desired TTL. Or, if you absolutely needed to know the TTL for individual data, you might just need to change your schema from collections back to good old dynamic columns, for which the TTL query definitely works.
Or, a third possibility could be that you add another column to your schema that holds the TTL of your collection. For example:
CREATE TABLE test (
key text PRIMARY KEY,
data map<text, text>,
data_ttl text
) WITH ...
You could then keep track of the TTL of the entire map column 'data' by always updating column 'data_ttl' whenever you update 'data'. Then, you can query 'data_ttl' just like any other column:
SELECT ttl(data_ttl) FROM test;
I realize none of these solutions are perfect... I'm still trying to figure out what will work best for me, too.

Azure query using the select

I am trying to get a query in azure in which I want to get the entity with the given partition key and row key based on Date.
I am keeping entities
Partisionkey, row key, Date, Additional info.
I am looking for a query using tableservies so that ,
I always get the latest one (using date)
How can I get the query? (I am using node and Azure)
TableQuery
.select()
.from('myusertables')
.where('PartitionKey eq ?', '545455');
How write the table query?
To answer you question, check out this previously answered question: How to select only the records with the highest date in LINQ
However, you may be facing a design issue. Performing the operation you are trying to do will require you to pull all the entities from the underlying Azure Table, which will perform slower over time as entities are added. So you may want to reconsider your design and possibly change the way you use your partitionkey and rowkey. You could also store the latest entities in a separate table, so that only 1 entity is found per table, transforming your scan/filter into a seek operation. Food for thought...

Resources