Structuring a query response with PostgreSQL - node.js

I am trying to construct a query to return data from multiple tables and build them into a single array of objects to return to the client. I have two tables, incidents and sources. Each source has an incident_id that corresponds to an incident in the first table.
Since there can be more than one source I want to query for the incidents, then on each incident add a key of source that has a value of the array of associated sources. The desired final structure is this:
{
"incident_id": 1,
"id": "wa-olympia-1",
"city": "Olympia",
"state": "Washington",
"lat": 47.0417,
"long": -122.896,
"title": "Police respond to broken windows with excessive force",
"desc": "Footage shows a few individuals break off from a protest to smash City Hall windows. Protesters shout at vandals to stop.\n\nPolice then arrive. They arrest multiple individuals near the City Hall windows, including one individual who appeared to approach the vandals in an effort to defuse the situation.\n\nPolice fire tear gas and riot rounds at protesters during the arrests. Protesters become agitated.\n\nAfter police walk arrestee away, protesters continue to shout at police. Police respond with a second bout of tear gas and riot rounds.\n\nA racial slur can be heard shouted, although it is unsure who is shouting.",
"date": "2020-05-31T05:00:00.000Z",
"src": ['http://google.com']
}
Here is the route as it stands:
router.get('/showallincidents', (req, res) => {
Incidents.getAllIncidents()
.then((response) => {
const incidents = response.map((incident) => {
const sources = Incidents.createSourcesArray(incident.incident_id);
return {
...incident,
src: sources,
};
});
res.json(incidents);
})
.catch((err) => {
res.status(500).json({ message: 'Request Error' });
});
});
Here are the models I currently have:
async function getAllIncidents() {
return await db('incidents');
}
async function createSourcesArray(incident_id) {
const sources = await db('sources')
.select('*')
.where('sources.incident_id', incident_id);
return sources;
}
When this endpoint is hit I get a "too many connections" error. Please advise.

I found a solution. What I decided to do was to query the two tables independently. Then I looped through the first result array, and within that loop looped through the second array, checking for the foreign key they share, then when I found a match, added those results to an array on the original object, then returned a new array with the objects of the first with the associated data from the second. Models are unchanged, here is the updated route.
router.get('/showallincidents', async(req, res) => {
try {
const incidents = await Incidents.getAllIncidents();
const sources = await Incidents.getAllSources();
const responseArray = []
// Reconstructs the incident object with it's sources to send to front end
incidents.forEach((incident) => {
incident['src'] = []
sources.forEach(source => {
if (source.incident_id === incident.incident_id) {
incident.src.push(source)
}
})
responseArray.push(incident)
})
res.json(responseArray)
} catch (e) {
res.status(500).json({
message: 'Request Error'
});
}
});

Are the two tables in the same database? If that is the case, it is much more efficent to do the primary/foreign key match by doing an SQL join. What you have implemented is "nested loop join" which might not be the optimal way to match depending on the value distribution of the primary key. You can search SQL join algorithms to see examples and pro/cons.
If the tables are in different databases then indeed client side join is likely your only option. Though again if you know something about the underlying distribution it might be better to do a hash join

Related

Creating a server-side script to fetch from

I want to fetch countries and cities from my front end, but I know first, I need to make a server-side script on the backend to be able to do so.
If geography is a mock package where I can do so, and this is the code I have thus far, how could I prepare my backend to receive these fetch requests?
app.get('/:locations', function (req, res) {
Geography.init().then(function() {
console.log(Geography)
Geography.open(req.params.url).then(function(site) {
console.log(Geography)
site.analyze().then(function(results) {
res.json(results)
})
})
})
})
Would it look something like this? (incomplete, of course....)
select
countries
cities
Taking the library I used as an example in the comments:
In the example provided in the readme, a JSON reponse is returned of an array of cities matching the filter (or close to it) such as [{}, {}, {}, ...]
So, if you wanted to form a response from this, you could just take some of the data. For example, if you wanted to return the country along with a latitude and longitude you could do:
// let "cities" be the JSON response from the library
// cities[0] is the first object in the array from the response
res.json({
"country": cities[0].country,
"location": {
"lat": cities[0].loc[1], //longitude comes first
"lon": cities[0].loc[0]
}
});
You could implement this with your GET endpoint like:
app.get('/:locations', function (req, res) {
cities = citiesLibrary.filter(city => city.name.match(req.params.locations));
});
But if you wanted to return all the results returned by the library then you could simply just use:
res.json(cities);

Make one column dependent on association Sequelize

I have a table called HOUSE. And it has a column named STATUS.
I also have a table called TASK and it also has a column named STATUS.
Each house has many tasks. And if there's one task that has a status of inProgress, the house status shall be inProgress. And if all of the tasks are done, then house is done.
I want this status column of the house be dependent on the status of its all tasks.
When I call /getHouses, here's what I do to add a property called status to each house object, because currently I have no STATUS column in the HOUSE table.
exports.getMyHouses = (req, res) => {
const page = myUtil.parser.tryParseInt(req.query.page, 0)
const limit = myUtil.parser.tryParseInt(req.query.limit, 10)
db.House.findAndCountAll({
where: { userId: req.user.id },
include: [
{
model: db.Task,
as: "task",
include: [
{
model: db.Photo,
as: "photos"
}
]
},
{
model: db.Address,
as: "address"
}
],
offset: limit * page,
limit: limit,
order: [["id", "ASC"]],
})
.then(data => {
let newData = JSON.parse(JSON.stringify(data))
const houses = newData.rows
for (let house of houses) {
house.status = "done"
const tasks = house.task
for (let task of tasks) {
if (task.status == "inProgress") {
house.status = "inProgress"
break
}
}
}
res.json(myUtil.response.paging(newData, page, limit))
})
.catch(err => {
console.log("Error get houses: " + err.message)
res.status(500).send({
message: "An error has occured while retrieving data."
})
})
}
EDIT: I just realized that perhaps I can update the house's status column each time there's an update in the task's status. I've never thought about this before.
But I would still love it if anyone could confirm that this is a good strategy or if there's a better one.
The option you have is viable as long as filtering by the house's status isn't something you require. This would essentially be called a virtual field (since it isn't something directly from the database). If you do need to filter by this field, you'd then need to query for all the tasks InProgress and get the unique house IDs.
You could update the house's status column on task update too but you could run into some race conditions if, for example, multiple requests were being made to update tasks to the same house. Make sure to run a transaction here if you were too. Querying/filtering for houses with InProgress tasks would be much faster since you can query it directly. However, updates would be slower since you'd need to run a task update, a count query on tasks, and an update query on the house.
Both have it's pro's and con's, it mainly depends on your application design's requirement.

How to avoid two concurrent API requests breaking the logic behind document validation?

I have an API that in order to insert a new item it needs to be validated. The validation basically is a type validator(string, number, Date, e.t.c) and queries the database that checks if the "user" has an "item" in the same date, which if it does the validation is unsuccessful.
Pseudocode goes like this:
const Item = require("./models/item");
function post(newDoc){
let errors = await checkForDocErrors(newDoc)
if (errors) {
throw errors;
}
let itemCreated = await Item.create(newDoc);
return itemCreated;
}
My problem is if I do two concurrent requests like this:
const request = require("superagent");
// Inserts a new Item
request.post('http://127.0.0.1:5000/api/item')
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Water Bottle"
})
/*
Inserts a new Item, which shouldn't do. Resulting in two items having the
same date.
*/
request.post('http://127.0.0.1:5000/api/item')
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Toothpick"
})
Both will be successful, which it shouldn't be since an "user" cannot have two "items" in the same date.
If I execute the second one after the first is finished, everything works as expected.
request.post('http://127.0.0.1:5000/api/item') // Inserts a new Item
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Water Bottle"
})
.then((res) => {
// It is not successful since there is already an item with that date
// as expected
request.post('http://127.0.0.1:5000/api/item')
.send({
"id_user": "6c67ea36-5bfd-48ec-af62-cede984dff9d",
"start_date": "2019-04-02",
"name": "Toothpick"
})
})
To avoid this I send one request with an array of documents, but I want to prevent this issue or at least make less likely to happen.
SOLUTION
I created a redis server. Used the package redis-lock and wrapped around the POST route.
var client = require("redis").createClient()
var lock = require("redis-lock")(client);
var itemController = require('./controllers/item');
router.post('/', function(req, res){
let userId = "";
if (typeof req.body === 'object' && typeof req.body.id_user === 'string') {
userId = req.body.id_user;
}
lock('POST ' + req.path + userId, async function(done){
try {
let result = await itemController.post(req.body)
res.json(result);
} catch (e) {
res.status(500).send("Server Error");
}
done()
})
}
Thank you.
Explain
That is a race condition.
two or more threads can access shared data and they try to change it at the same time
What is a race condition?
Solution:
There are many ways to prevent conflict data in this case, a lock is 1 option.
You can lock on application level or database level... but I prefer you read this thread before chose any of them.
Optimistic vs. Pessimistic locking
Quick solution: pessimistic-lock https://www.npmjs.com/package/redis-lock
You should create a composite index or a composite primary key that includes the id_user and the start_date fields. This will ensure that no documents for the same user with the same date can be created, and the database will throw an error if you'll try to do it.
Composite index with mongoose
You could also use transactions. To do it, you should execute the find and the create methods inside a transaction, to ensure that no concurrent queries on the same document will be executed.
Mongoose transactions tutorial
More infos
I would go with an unique composite index, that in your specific case should be something like
mySchema.index({user_id: 1, start_date: 1}, {unique: true});

How to chain groups of observables that previously were each one in a forkjoin() operation

I have an orders table in mysql, each order has a number of documents associated to it, whether they are quotes, invoices, etc. There is therefore a second table called "documents", which has a "document_id" primary key and a "order_id" foreign key; In a similar fashion, I have another case for the different checks that technicians do to every vehicle, then another table for vehicle pictures. I am creating a web service using Node and Express that needs to return a json that similar to this...
[
{
"order_id": 1003,
"customer_id": 8000,
"csi": 90,
"date_admitted": "2016-10-28T05:00:00.000Z",
"plates": "YZG-5125",
...
documents: {
"type": "invoice",
"number": "1234",
...
},
checks: {
"scanner": "good",
"battery": "average",
...
},
vehicle_pictures: {
"title": "a title...",
"path": "the file path"
...
}
},
{
...
},
...
]
As you can see, it is necessary to do three queries for each order, one for checks, another for documents and a third for pictures, then I need to add these sub results to the order for finally return the array in the response.
This would be a very easy task to do in the old world of synchronous programming, however due to the asynchronous nature of the query() method in the connection object of the mysql library, this threats to become a real hell.
In a situation where I would have to process a single order, using RxJS library on the server with a forkJoin() would suffice to process all three results at once, what I am not sure is how to "chain" every order (with a forkJoin for managing the 3 queries), so everything gets process and at the end I can call res.json(result) with everything neatly assembled.
Note: I want to solve this with RxJS instead of using a sync library package like node-mysql-libmysqlclient. The reason basically is that the "right" way to do this in an async language like Node JS is go async. Also I want to use RxJS and not async, q promises, or any other library since Observables seem to be the absolute winner in the async solutions contest and also want to be consistent in all the solutions I develop, so this question is mostly oriented for RxJS masters.
Also every single question I have found in so similar to this has the classical "purist" reply saying that if you are using Node you "should" use asynchronous and don't think in synchronous solutions. So this is a challenge for those that defend that position, since this (I think) is one of those cases where sync in Node makes sense, however I really want to learn how to do this with RxJS instead of thinking that this is impossible, which I am sure is not.
If I understood things correctly, you have some data that you want to to use to gather additional data from the database via async operations. You want to build a combined dataset consisting of the original data and the additional information that the subsequent queries have returned.
As you have mentioned, you can use forkJoin to wait for multiple operations to complete before proceeding. You have to do this for each item in the data sequence and then use switchMap to merge the result back into the original stream.
Have a look at the following example jsbin that demonstrate how this can be done:
const data = [
{ id: 1, init: 'a' },
{ id: 2, init: 'b' },
{ id: 3, init: 'c' }
]
function getA(id) {
return Rx.Observable.timer(1000)
.map(() => {
return { id, a: 'abc' }
})
.toPromise();
}
function getB(id) {
return Rx.Observable.timer(1500)
.map(() => {
return { id, b: 'def' }
})
.toPromise();
}
Rx.Observable.interval(5000)
.take(data.length)
.map(id => data[id])
.do(data => { console.log(`query id ${data.id}`)})
.switchMap((data) => {
return Rx.Observable.forkJoin(getA(data.id), getB(data.id), (a, b) => {
console.log(`got results for id ${data.id}`);
return Object.assign({}, data, a, b);
});
})
.subscribe(x => console.log(x));

Properly chaining RethinkDB table and object creation commands with rethinkdbdash

I am processing a stream of text data where I don't know ahead of time what the distribution of its values are, but I know each one looks like this:
{
"datetime": "1986-11-03T08:30:00-07:00",
"word": "wordA",
"value": "someValue"
}
I'm trying to bucket it into RethinkDB objects based on it's value, where the objects look like the following:
{
"bucketId": "1",
"bucketValues": {
"wordA": [
{"datetime": "1986-11-03T08:30:00-07:00"},
{"datetime": "1986-11-03T08:30:00-07:00"}
],
"wordB": [
{"datetime": "1986-11-03T08:30:00-07:00"},
{"datetime": "1986-11-03T08:30:00-07:00"}
]
}
}
The purpose is to eventually count the number of occurrences for each word in each bucket.
Since I'm dealing with about a million buckets, and have no knowledge of the words ahead of time, the plan is to create this objects on the fly. I am new to RethinkDB, however, and I have tried my best to do this in such a way that I don't attempt to add a word key to a bucket that doesn't exist yet, but I am not entirely sure if I'm following best-practice here chaining the commands as follows (note that I am running this on a Node.js server using :
var bucketId = "someId";
var word = "someWordValue"
r.do(r.table("buckets").get(bucketId), function(result) {
return r.branch(
// If the bucket doesn't exist
result.eq(null),
// Create it
r.table("buckets").insert({
"id": bucketId,
"bucketValues" : {}
}),
// Else do nothing
"Bucket already exists"
);
})
.run()
.then(function(result) {
console.log(result);
r.table("buckets").get(bucketId)
.do(function(bucket) {
return r.branch(
// if the word already exists
bucket("bucketValues").keys().contains(word),
// Just append to it (code not implemented yet)
"Word already exists",
// Else create the word and append it
r.table("buckets").get(bucketId).update(
{"bucketValues": r.object(word, [/*Put the timestamp here*/])}
)
);
})
.run()
.then(function(result) {
console.log(result);
});
});
Do I need to execute run here twice, or am I way off base on how you're supposed to properly chain things together with RethinkDB? I just want to make sure I'm not doing this the wrong/hard way before I get much deeper into this.
You don't have to execute run multiple times, depend on what you want. Basically, a run() end the chain and send query to server. So we do all the thing to build the query, and end it with run() to execute it. If you use run() two times, that means it is send to server 2 times.
So if we can do all processing using only RethinkDB function, we need to call run only one time. However, if we want to some kind of post-processing data, using client side, then we have no choice. Usually I tried to do all processing using RethinkDB: with control structure, looping, and anonymous function we can go pretty far without letting client do some logic.
In your case, the query can be rewritten with NodeJS, using official driver:
var r = require('rethinkdb')
var bucketId = "someId2";
var word = "someWordValue2";
r.connect()
.then((conn) => {
r.table("buckets").insert({
"id": bucketId,
"bucketValues" : {}
})
.do((result) => {
// We don't care about result at all
// We just want to ensure it's there
return r.table('buckets').get(bucketId)
.update(function(bucket) {
return {
'bucketValues': r.object(
word,
bucket('bucketValues')(word).default([])
.append(r.now()))
}
})
})
.run(conn)
.then((result) => { conn.close() })
})

Resources