Properly chaining RethinkDB table and object creation commands with rethinkdbdash - node.js

I am processing a stream of text data where I don't know ahead of time what the distribution of its values are, but I know each one looks like this:
{
"datetime": "1986-11-03T08:30:00-07:00",
"word": "wordA",
"value": "someValue"
}
I'm trying to bucket it into RethinkDB objects based on it's value, where the objects look like the following:
{
"bucketId": "1",
"bucketValues": {
"wordA": [
{"datetime": "1986-11-03T08:30:00-07:00"},
{"datetime": "1986-11-03T08:30:00-07:00"}
],
"wordB": [
{"datetime": "1986-11-03T08:30:00-07:00"},
{"datetime": "1986-11-03T08:30:00-07:00"}
]
}
}
The purpose is to eventually count the number of occurrences for each word in each bucket.
Since I'm dealing with about a million buckets, and have no knowledge of the words ahead of time, the plan is to create this objects on the fly. I am new to RethinkDB, however, and I have tried my best to do this in such a way that I don't attempt to add a word key to a bucket that doesn't exist yet, but I am not entirely sure if I'm following best-practice here chaining the commands as follows (note that I am running this on a Node.js server using :
var bucketId = "someId";
var word = "someWordValue"
r.do(r.table("buckets").get(bucketId), function(result) {
return r.branch(
// If the bucket doesn't exist
result.eq(null),
// Create it
r.table("buckets").insert({
"id": bucketId,
"bucketValues" : {}
}),
// Else do nothing
"Bucket already exists"
);
})
.run()
.then(function(result) {
console.log(result);
r.table("buckets").get(bucketId)
.do(function(bucket) {
return r.branch(
// if the word already exists
bucket("bucketValues").keys().contains(word),
// Just append to it (code not implemented yet)
"Word already exists",
// Else create the word and append it
r.table("buckets").get(bucketId).update(
{"bucketValues": r.object(word, [/*Put the timestamp here*/])}
)
);
})
.run()
.then(function(result) {
console.log(result);
});
});
Do I need to execute run here twice, or am I way off base on how you're supposed to properly chain things together with RethinkDB? I just want to make sure I'm not doing this the wrong/hard way before I get much deeper into this.

You don't have to execute run multiple times, depend on what you want. Basically, a run() end the chain and send query to server. So we do all the thing to build the query, and end it with run() to execute it. If you use run() two times, that means it is send to server 2 times.
So if we can do all processing using only RethinkDB function, we need to call run only one time. However, if we want to some kind of post-processing data, using client side, then we have no choice. Usually I tried to do all processing using RethinkDB: with control structure, looping, and anonymous function we can go pretty far without letting client do some logic.
In your case, the query can be rewritten with NodeJS, using official driver:
var r = require('rethinkdb')
var bucketId = "someId2";
var word = "someWordValue2";
r.connect()
.then((conn) => {
r.table("buckets").insert({
"id": bucketId,
"bucketValues" : {}
})
.do((result) => {
// We don't care about result at all
// We just want to ensure it's there
return r.table('buckets').get(bucketId)
.update(function(bucket) {
return {
'bucketValues': r.object(
word,
bucket('bucketValues')(word).default([])
.append(r.now()))
}
})
})
.run(conn)
.then((result) => { conn.close() })
})

Related

Structuring a query response with PostgreSQL

I am trying to construct a query to return data from multiple tables and build them into a single array of objects to return to the client. I have two tables, incidents and sources. Each source has an incident_id that corresponds to an incident in the first table.
Since there can be more than one source I want to query for the incidents, then on each incident add a key of source that has a value of the array of associated sources. The desired final structure is this:
{
"incident_id": 1,
"id": "wa-olympia-1",
"city": "Olympia",
"state": "Washington",
"lat": 47.0417,
"long": -122.896,
"title": "Police respond to broken windows with excessive force",
"desc": "Footage shows a few individuals break off from a protest to smash City Hall windows. Protesters shout at vandals to stop.\n\nPolice then arrive. They arrest multiple individuals near the City Hall windows, including one individual who appeared to approach the vandals in an effort to defuse the situation.\n\nPolice fire tear gas and riot rounds at protesters during the arrests. Protesters become agitated.\n\nAfter police walk arrestee away, protesters continue to shout at police. Police respond with a second bout of tear gas and riot rounds.\n\nA racial slur can be heard shouted, although it is unsure who is shouting.",
"date": "2020-05-31T05:00:00.000Z",
"src": ['http://google.com']
}
Here is the route as it stands:
router.get('/showallincidents', (req, res) => {
Incidents.getAllIncidents()
.then((response) => {
const incidents = response.map((incident) => {
const sources = Incidents.createSourcesArray(incident.incident_id);
return {
...incident,
src: sources,
};
});
res.json(incidents);
})
.catch((err) => {
res.status(500).json({ message: 'Request Error' });
});
});
Here are the models I currently have:
async function getAllIncidents() {
return await db('incidents');
}
async function createSourcesArray(incident_id) {
const sources = await db('sources')
.select('*')
.where('sources.incident_id', incident_id);
return sources;
}
When this endpoint is hit I get a "too many connections" error. Please advise.
I found a solution. What I decided to do was to query the two tables independently. Then I looped through the first result array, and within that loop looped through the second array, checking for the foreign key they share, then when I found a match, added those results to an array on the original object, then returned a new array with the objects of the first with the associated data from the second. Models are unchanged, here is the updated route.
router.get('/showallincidents', async(req, res) => {
try {
const incidents = await Incidents.getAllIncidents();
const sources = await Incidents.getAllSources();
const responseArray = []
// Reconstructs the incident object with it's sources to send to front end
incidents.forEach((incident) => {
incident['src'] = []
sources.forEach(source => {
if (source.incident_id === incident.incident_id) {
incident.src.push(source)
}
})
responseArray.push(incident)
})
res.json(responseArray)
} catch (e) {
res.status(500).json({
message: 'Request Error'
});
}
});
Are the two tables in the same database? If that is the case, it is much more efficent to do the primary/foreign key match by doing an SQL join. What you have implemented is "nested loop join" which might not be the optimal way to match depending on the value distribution of the primary key. You can search SQL join algorithms to see examples and pro/cons.
If the tables are in different databases then indeed client side join is likely your only option. Though again if you know something about the underlying distribution it might be better to do a hash join

How do I iterate of the object containing queries, and execute them all

I have an object that looks like this:
let queries = [
{
name: "checkUsers",
query: 'select * from users where inactive = 1'
},
{
name: "checkSubscriptions",
query: 'select * from subscriptions where invalid = 1'
}
]
I am making an AWS Lambda function that will iterate these queries, and if any of them returns a value, I will send an email.
I have come up with this pseudo code:
for (let prop in queries) {
const result = await mysqlConnector.runQuery(prop.query).catch(async error => {
// handle error in query
});
if (result.length < 0){
// send email
}
}
return;
I am wondering is this ideal approach? I need to iterate all the object queries.
I don't see anything wrong with what you are trying to achieve but there are few changes you could do
Try to use Promise.all if you can. This will speed up the overall process as things will execute in parallel. It will depend on number of queries as well.
Try leverage executing multiple statements in one query. This way you will make one call and then you can add the logic to identify. Check here

How to avoid two concurrent API requests breaking the logic behind document validation?

I have an API that in order to insert a new item it needs to be validated. The validation basically is a type validator(string, number, Date, e.t.c) and queries the database that checks if the "user" has an "item" in the same date, which if it does the validation is unsuccessful.
Pseudocode goes like this:
const Item = require("./models/item");
function post(newDoc){
let errors = await checkForDocErrors(newDoc)
if (errors) {
throw errors;
}
let itemCreated = await Item.create(newDoc);
return itemCreated;
}
My problem is if I do two concurrent requests like this:
const request = require("superagent");
// Inserts a new Item
request.post('http://127.0.0.1:5000/api/item')
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Water Bottle"
})
/*
Inserts a new Item, which shouldn't do. Resulting in two items having the
same date.
*/
request.post('http://127.0.0.1:5000/api/item')
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Toothpick"
})
Both will be successful, which it shouldn't be since an "user" cannot have two "items" in the same date.
If I execute the second one after the first is finished, everything works as expected.
request.post('http://127.0.0.1:5000/api/item') // Inserts a new Item
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Water Bottle"
})
.then((res) => {
// It is not successful since there is already an item with that date
// as expected
request.post('http://127.0.0.1:5000/api/item')
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Toothpick"
})
})
To avoid this I send one request with an array of documents, but I want to prevent this issue or at least make less likely to happen.
SOLUTION
I created a redis server. Used the package redis-lock and wrapped around the POST route.
var client = require("redis").createClient()
var lock = require("redis-lock")(client);
var itemController = require('./controllers/item');
router.post('/', function(req, res){
let userId = "";
if (typeof req.body === 'object' && typeof req.body.id_user === 'string') {
userId = req.body.id_user;
}
lock('POST ' + req.path + userId, async function(done){
try {
let result = await itemController.post(req.body)
res.json(result);
} catch (e) {
res.status(500).send("Server Error");
}
done()
})
}
Thank you.
Explain
That is a race condition.
two or more threads can access shared data and they try to change it at the same time
What is a race condition?
Solution:
There are many ways to prevent conflict data in this case, a lock is 1 option.
You can lock on application level or database level... but I prefer you read this thread before chose any of them.
Optimistic vs. Pessimistic locking
Quick solution: pessimistic-lock https://www.npmjs.com/package/redis-lock
You should create a composite index or a composite primary key that includes the id_user and the start_date fields. This will ensure that no documents for the same user with the same date can be created, and the database will throw an error if you'll try to do it.
Composite index with mongoose
You could also use transactions. To do it, you should execute the find and the create methods inside a transaction, to ensure that no concurrent queries on the same document will be executed.
Mongoose transactions tutorial
More infos
I would go with an unique composite index, that in your specific case should be something like
mySchema.index({user_id: 1, start_date: 1}, {unique: true});

How to chain groups of observables that previously were each one in a forkjoin() operation

I have an orders table in mysql, each order has a number of documents associated to it, whether they are quotes, invoices, etc. There is therefore a second table called "documents", which has a "document_id" primary key and a "order_id" foreign key; In a similar fashion, I have another case for the different checks that technicians do to every vehicle, then another table for vehicle pictures. I am creating a web service using Node and Express that needs to return a json that similar to this...
[
{
"order_id": 1003,
"customer_id": 8000,
"csi": 90,
"date_admitted": "2016-10-28T05:00:00.000Z",
"plates": "YZG-5125",
...
documents: {
"type": "invoice",
"number": "1234",
...
},
checks: {
"scanner": "good",
"battery": "average",
...
},
vehicle_pictures: {
"title": "a title...",
"path": "the file path"
...
}
},
{
...
},
...
]
As you can see, it is necessary to do three queries for each order, one for checks, another for documents and a third for pictures, then I need to add these sub results to the order for finally return the array in the response.
This would be a very easy task to do in the old world of synchronous programming, however due to the asynchronous nature of the query() method in the connection object of the mysql library, this threats to become a real hell.
In a situation where I would have to process a single order, using RxJS library on the server with a forkJoin() would suffice to process all three results at once, what I am not sure is how to "chain" every order (with a forkJoin for managing the 3 queries), so everything gets process and at the end I can call res.json(result) with everything neatly assembled.
Note: I want to solve this with RxJS instead of using a sync library package like node-mysql-libmysqlclient. The reason basically is that the "right" way to do this in an async language like Node JS is go async. Also I want to use RxJS and not async, q promises, or any other library since Observables seem to be the absolute winner in the async solutions contest and also want to be consistent in all the solutions I develop, so this question is mostly oriented for RxJS masters.
Also every single question I have found in so similar to this has the classical "purist" reply saying that if you are using Node you "should" use asynchronous and don't think in synchronous solutions. So this is a challenge for those that defend that position, since this (I think) is one of those cases where sync in Node makes sense, however I really want to learn how to do this with RxJS instead of thinking that this is impossible, which I am sure is not.
If I understood things correctly, you have some data that you want to to use to gather additional data from the database via async operations. You want to build a combined dataset consisting of the original data and the additional information that the subsequent queries have returned.
As you have mentioned, you can use forkJoin to wait for multiple operations to complete before proceeding. You have to do this for each item in the data sequence and then use switchMap to merge the result back into the original stream.
Have a look at the following example jsbin that demonstrate how this can be done:
const data = [
{ id: 1, init: 'a' },
{ id: 2, init: 'b' },
{ id: 3, init: 'c' }
]
function getA(id) {
return Rx.Observable.timer(1000)
.map(() => {
return { id, a: 'abc' }
})
.toPromise();
}
function getB(id) {
return Rx.Observable.timer(1500)
.map(() => {
return { id, b: 'def' }
})
.toPromise();
}
Rx.Observable.interval(5000)
.take(data.length)
.map(id => data[id])
.do(data => { console.log(`query id ${data.id}`)})
.switchMap((data) => {
return Rx.Observable.forkJoin(getA(data.id), getB(data.id), (a, b) => {
console.log(`got results for id ${data.id}`);
return Object.assign({}, data, a, b);
});
})
.subscribe(x => console.log(x));

Pushing new value to array in MongoDB document with NodeJS

I have a MongoDB collection with documents that look as follows:
{
"id": 51584,
"tracks": [],
"_id": {
"$oid": "ab5a7... some id ...cc81da0"
}
}
I want to push a single track into the array, so I try the following NodeJS code:
function addTrack(post,callback){
var partyId = post['partyId'], trackId = post['trackId'];
// I checked here that partyId and trackId are valid vars.
db.db_name.update({id: partyId}, { $push: { tracks: [trackId] } }, function(err, added) {
if( err || !added ) {
console.log("Track not added.");
callback(null,added);
}
else {
console.log("Track added to party with id: "+partyId);
callback(null,added);
}
});
}
This returns successfully with the callback that the track was added. However, when I inspect the database manually it is not updated and the array tracks is still empty.
I've tried a lot of different things for the tracks element to be pushed (ie. turning it into an array etc.) but no luck so far.
PS: Perhaps I should note that I'm using MongoLab to host the database.
Any help would be most welcome.
I found my problem, in the addTrack update({id: partyId},.. method partyId was not a string so it didn't find any docs to push to. Thanks to SudoGetBeer for leading me to the solution.
If your posted document is correct(get via find() i.e.):
tracks is a subdocument or embedded document ( http://docs.mongodb.org/manual/tutorial/query-documents/#embedded-documents )
The difference is simple: {} = Document, [] = Array
So if you want to use $push you need to update the tracks field to be an array
Here's how I'm doing it:
// This code occurs inside an async function called editArticle()
const addedTags = ['one', 'two', 'etc']
// ADD NEW TAGS IN MONGO DB
try {
const updateTags = addedTags.reduce((all, tag) => {
all.push(articles.updateOne({ slug: article.slug }, { $push: { tags: tag } }))
return all
}, [])
await Promise.all(updateTags)
} catch (e) {
log('error', 'addTags', e)
throw 'addTags'
}
addTags is the Array of tags, and we need to push them into Mongo DB one at a time so that the document we are pushing into looks like this after:
{
tags: ["existingTag1", "existingTag2", "one", "two", "etc"]
}
If you push an array like in the original question above, it would look like this:
{
tags: ["existingTag1", "existingTag2", ["one", "two", "etc"]]
}
So, tags[2] would be ["one", "two", "etc"], not what you want.
I have shown .reduce() which is an accumulator, which is a fancy, immutable way of doing this:
let updateTags = []
addedTags.forEach((tag) => {
updateTags.push(articles.updateOne({ slug: article.slug }, { $push: { tags: tag } }))
})
At this point, updateTags contains an array of functions, so calling Promise.all(updateTags) will run them all and detonate if any of them fail. Since we are using Mongo DB Native Driver, you will have to clean up if any errors occur, so you will probably want to track the pre-write state before calling Promise.all() (ie: What were the tags before?)
In the catch block, or catch block of the upper scope, you can "fire restore previous state" logic (rollback) and/or retry.
Something like:
// Upper scope
catch (e) {
if (e === 'addTags') rollback(previousState)
throw 'Problem occurred adding tags, please restart your computer.'
// This can now bubble up to your front-end client
}

Resources