After reading https://stackoverflow.com/a/14797359/4158593 : about nodejs single thread and that it takes the first parameter of async function, processes it and then uses the callback to respond when everything is ready. What confused me is what if I had multiple queries that need to be excused all at once and tell nodeJS to block other requests by adding them in a queue.
To do that I realised that I need to wrap my queries in another callback. And promises do that pretty well.
const psqlClient = psqlPool.connect();
return psqlClient.query(`SELECT username FROM usernames WHERE username=$1`, ['me'])
.then((data) => {
if(!data.rows[0].username) {
psqlClient.query(`INSERT INTO usernames (username) VALUES ('me')`);
}
else { ... }
});
This code is used during sign up, to check if username isn't taken before inserting. So it very important that nodejs puts other requests into a queue, and makes sure to select and insert at the same time. Because this code might allow people with the same username sent at the same time to select a username that has been already been taken, therefore two usernames will be inserted.
Questions
Does the code above executes queries all at once?
If 1 is correct, if I was to change the code like this
const psqlClient = psqlPool.connect();
return psqlClient.query(`SELECT username FROM usernames WHERE username=$1`, ['me'], function(err, reply) {
if(!reply.rows[0].username) {
psqlClient.query(`INSERT INTO usernames (username) VALUES ('me')`);
}
});
would that effect the behaviour?
If 1 is wrong, how should this be solved? I am going to need this pattern (mainly using select and insert/update one after another) for things like making sure that my XML sitemaps don't contain more than 50000 urls by storing the count for each file in my db which happens dynamically.
The only thing that can guarantee data integrity in your case is a single SELECT->INSERT query, which was discussed here many times.
Some examples:
Is SELECT or INSERT in a function prone to race conditions?
Get Id from a conditional INSERT
You should be able to find more of that here ;)
I also touched on this subject in a SELECT ⇒ INSERT example within pg-promise.
There is however an alternative, to make any repeated insert generate a conflict, in which case you can re-run your select to get the new record. But it is not always a suitable solution.
Here's a reference from the creator of node-postgres: https://github.com/brianc/node-postgres/issues/83#issuecomment-212657287. Basically queries are queued, but don't rely on them in production where you have many requests....
However you can use BEGIN and COMIT
var Client = require('pg').Client;
var client = new Client(/*your connection info goes here*/);
client.connect();
var rollback = function(client) {
//terminating a client connection will
//automatically rollback any uncommitted transactions
//so while it's not technically mandatory to call
//ROLLBACK it is cleaner and more correct
client.query('ROLLBACK', function() {
client.end();
});
};
client.query('BEGIN', function(err, result) {
if(err) return rollback(client);
client.query('INSERT INTO account(money) VALUES(100) WHERE id = $1', [1], function(err, result) {
if(err) return rollback(client);
client.query('INSERT INTO account(money) VALUES(-100) WHERE id = $1', [2], function(err, result) {
if(err) return rollback(client);
//disconnect after successful commit
client.query('COMMIT', client.end.bind(client));
});
});
});
Check out: https://github.com/brianc/node-postgres/wiki/Transactions
However this doesn't block the table. Here's a list of solutions: Update where race conditions Postgres (read committed)
Related
I was asked to import a csv file from a server daily and parse the respective header to the appropriate fields in mongoose.
My first idea was to make it to run automatically with a scheduler using the cron module.
const CronJob = require('cron').CronJob;
const fs = require("fs");
const csv = require("fast-csv")
new CronJob('30 2 * * *', async function() {
await parseCSV();
this.stop();
}, function() {
this.start()
}, true);
Next, the parseCSV() function code is as follow:
(I have simplify some of the data)
function parseCSV() {
let buffer = [];
let stream = fs.createReadStream("data.csv");
csv.fromStream(stream, {headers:
[
"lot", "order", "cwotdt"
]
, trim:true})
.on("data", async (data) =>{
let data = { "order": data.order, "lot": data.lot, "date": data.cwotdt};
// Only add product that fulfill the following condition
if (data.cwotdt !== "000000"){
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
}
})
.on("end", function(){
db.Product.find({}, function(err, productAvailable){
// Check whether database exists or not
if(productAvailable.length !== 0){
// console.log("Database Exists");
// Add subsequent onward
db.Product.insertMany(buffer)
buffer = [];
} else{
// Add first time
db.Product.insertMany(buffer)
buffer = [];
}
})
});
}
It is not a problem if it's just a few line of rows in the csv file but just only reaching 2k rows, I encountered a problem. The culprit is due to the if condition checking when listening to the event handler on, it needs to check every single row to see whether the database contains the data already or not.
The reason I'm doing this is that the csv file will have new data added into it and I need to add all the data for the first time if database is empty or look into every single row and only add those new data into mongoose.
The 1st approach I did from here (as in the code),was using async/await to make sure that all the datas have been read before proceeding to the event handler end. This helps but I see from time to time (with mongoose.set("debug", true);), some data are being queried twice, which I have no idea why.
The 2nd approach was not to use the async/await feature, this has some downside since the data was not fully queried, it proceeded straight to the event handler end and then insertMany some of the datas which were able to get pushed into the buffer.
If i stick with the current approach, it is not an issue, but the query will take 1 to 2 minutes, not to mention even more if the database keeps growing. So, during those few minutes of querying, the event queue got blocked and therefore when sending request to the server, the server time out.
I used stream.pause() and stream.resume() before this code but I can't get it to work as it just jump straight to the end event handler first. This cause the buffer to be empty every single time since end event handler runs before the on event handler
I cant' remember the links that I used but the fundamentals that I got from is through this.
Import CSV Using Mongoose Schema
I saw these threads:
Insert a large csv file, 200'000 rows+, into MongoDB in NodeJS
Can't populate big chunk of data to mongodb using Node.js
to be similar to what I need but it's a bit too complicated for me to understand what is going on. Seems like using socket or a child process maybe? Furthermore, I still need to check conditions before adding into the buffer
Anyone care to guide me on this?
Edit: await is removed from console.log as it is not asynchronous
Forking a child process approach:
When web service got a request of csv data file save it somewhere in app
Fork a child process -> child process example
Pass the file url to the child_process to run the insert checks
When child process finish processing the csv file, delete the file
Like what Joe said, indexing the DB would speed up the processing time by a lot when there are lots(millions) of tuples.
If you create an index on order and lot. The query should be very fast.
db.Product.createIndex( { order: 1, lot: 1 }
Note: This is a compound index and may not be the ideal solution. Index strategies
Also, your await on console.log is weird. That may be causing your timing issues. console.log is not async. Additionally the function is not marked async
// removing await from console.log
let product = {"order": data.order, "lot": data.lot}
// Check whether product exist in database or not
await db.Product.find(product, function(err, foundProduct){
if(foundProduct && foundProduct.length !== 0){
console.log("Product exists")
} else{
buffer.push(product);
console.log("Product not exists")
}
})
I would try with removing the await on console.log (that may be a red herring if console.log is for stackoverflow and hiding the actual async method.) However, be sure to mark the function with async if that is the case.
Lastly, if the problem still exists. I may look into a 2 tiered approach.
Insert all lines from the CSV file into a mongo collection.
Process that mongo collection after the CSV has been parsed. Removing the CSV from the equation.
I want to built a real time chat system for my project but actually I have some problems with Redis because I want my data stored as better as possible.
My problem:
I'd like to use Socket Io to do real time chatting in a closed group (of two people), but how to store messages?
Redis is a key value store and that means that if i want to store something i need to add an unique key to my data before getting stored.
If the same user posts more than one messages which keys would I use inside redis? I'm thinking about unique ids as unique keys but since I want to be able to fetch this comments when a user log the chat page, but if I do that I need to write another database that relate chat ids to the user that posted that message
Am I forgetting anything? Is there a best method to do this?
Sorry for my bad English.
Redis is more then key-value store.
So you want the following:
chat messages,
two-person discussions,
you did not mention time constraints, so lets assume that you archive messages after a while,
you also don't say if you want separate "threads" between two people, like forums or continuous messages, like facebook. I'm assuming continuous.
For each user, you have to store messages he sends. Let's say APP_NAMESPACE:MESSAGES:<USER_ID>:<MESSAGE_ID>. We add userId here so that we can easily retreive all messages sent by a single user.
And, for each two users, you need to track their conversations. As a key, you can simply use their userids APP_NAMESPACE:CONVERSATIONS:<USER1_ID>-<USER2_ID>. To make sure you always get the same, shared conversation for the two users, you can sort their ids alfabetically, so that users 132 and 145 will both have 132:145 as conversation key
So what to store in "conversations"? Let's use a list: [messageKey, messageKey, messageKey].
Ok, but what is now the messageKey? Combo of userId above and a messageId (so we can get the actual message).
So basically, you need two things:
Store the message and give it an ID
Store a reference to this message to the relevant conversation.
With node and standard redis/hiredis client this would be somehting like (I'll skip the obvious error etc checks, and I'll write ES6. If you cannot read ES6 yet, just paste it to babel):
// assuming the init connects to redis and exports a redisClient
import redisClient from './redis-init';
import uuid from `node-uuid`;
export function storeMessage(userId, toUserId, message) {
return new Promise(function(resolve, reject) {
// give it an id.
let messageId = uuid.v4(); // gets us a random uid.
let messageKey = `${userId}:${messageId}`;
let key = `MY_APP:MESSAGES:${messageKey}`;
client.hmset(key, [
"message", message,
"timestamp", new Date(),
"toUserId", toUserId
], function(err) {
if (err) { return reject(err); }
// Now we stored the message. But we also want to store a reference to the messageKey
let convoKey = `MY_APP:CONVERSATIONS:${userId}-${toUserId}`;
client.lpush(convoKey, messageKey, function(err) {
if (err) { return reject(err); }
return resolve();
});
});
});
}
// We also need to retreive the messages for the users.
export function getConversation(userId, otherUserId, page = 1, limit = 10) {
return new Promise(function(resolve, reject) {
let [userId1, userId2] = [userId, otherUserId].sort();
let convoKey = `MY_APP:CONVERSATIONS:${userId1}-${userId2}`;
// lets sort out paging stuff.
let start = (page - 1) * limit; // we're zero-based here.
let stop = page * limit - 1;
client.lrange(convoKey, start, stop, function(err, messageKeys) {
if (err) { return reject(err); }
// we have message keys, now get all messages.
let keys = messageKeys.map(key => `MY_APP:MESSAGES:${key}`);
let promises = keys.map(key => getMessage(key));
Promise.all(promises)
.then(function(messages) {
// now we have them. We can sort them too
return resolve(messages.sort((m1, m2) => m1.timestamp - m2.timestamp));
})
.catch(reject);
});
});
}
// we also need the getMessage here as a promise. We could also have used some Promisify implementation but hey.
export function getMessage(key) {
return new Promise(function(resolve, reject) {
client.hgetall(key, function(err, message) {
if (err) { return reject(err); }
resolve(message);
});
});
}
Now that's crude and untested, but that's the gist of how you can do this.
Is redis is a constraint in your project?
you can go through this http://autobahn.ws/python/wamp/programming.html
I have a mongodb Relationships collection that stores the user_id and the followee_id(person the user is following). If I query for against the user_id I can find all the the individuals the user is following. Next I need to query the Users collection against all of the returned followee ids to get their personal information. This is where I confused. How would I accomplish this?
NOTE: I know I can embed the followees in the individual user's document and use and $in operator but I do not want to go this route. I want to maintain the most flexibility I can.
You can use an $in query without denormalizing the followees on the user. You just need to do a little bit of data manipulation:
Relationship.find({user_id: user_id}, function(error, relationships) {
var followee_ids = relationships.map(function(relationship) {
return relationship.followee_id;
});
User.find({_id: { $in: followee_ids}}, function(error, users) {
// voila
});
};
if i got your problem right(i think so).
you need to query each of the "individuals the user is following".
that means to query the database multiple queries about each one and get the data.
because the queries in node.js (i assume you using mongoose) are asynchronies you need to get your code more asynchronies for this task.
if you not familier with the async module in node.js it's about time to know it.
see npm async for docs.
i made you a sample code for your query and how it needs to be.
/*array of followee_id from the last query*/
function query(followee_id_arr, callback) {
var async = require('async')
var allResults = [];
async.eachSerias(followee_id_arr, function (f_id, callback){
db.userCollection.findOne({_id : f_id},{_id : 1, personalData : 1},function(err, data){
if(err) {/*handel error*/}
else {
allResults.push(data);
callback()
}
}, function(){
callback(null, allResults);
})
})
}
you can even make all the queries in parallel (for better preformance) by using async.map
I use node.js sqlite3 to manipulate data. I use these codes to insert data to database and get inserted id:
db.run("INSERT INTO myTable (name) VALUES ('test')");
db.get("SELECT last_insert_rowid() as id", function (err, row) {
console.log('Last inserted id is: ' + row['id']);
});
I think this is not stable. my db connection is always open. when my server serves this code on multiple and same time connections from clients, DoesSELECT last_insert_rowid() get id rightly?
Is sqlite last_insert_rowid atomic?
thanks.
By documentation sqlite3 Database#run(sql, [param, ...], [callback])
you can retrive lastID from callback.
try{
db.run("INSERT INTO TABLE_NAME VALUES (NULL,?,?,?,?)",data1,data2,data3,data4,function(err){
if(err){
callback({"status":false,"val":err});
}else{
console.log("val "+this.lastID);
callback({"status":true,"val":""});
}
});
}catch(ex){
callback({"status":false,"val":ex});
}
this.lastID return the last inserted row id By documentation sqlite3 Database#run(sql, [param, ...], [callback])
for Node.js Sqlite3 last inserted id you can use lastId
here is a simple Example
db.run("INSERT INTO foo ...", function(err) {
if(null == err){
// row inserted successfully
console.log(this.lastID);
} else {
//Oops something went wrong
console.log(err);
}
});
last_insert_rowid() returns the ROWID for the last insert operation on this connection.
The result is unpredictable if the function is called from multiple threads on the same database connection.
Documentation (for the C API):
https://www.sqlite.org/c3ref/last_insert_rowid.html
If you don't share your database connection (session) between multiple threads for concurrent inserts, this is safe. If multiple threads insert on the same connection, this is unsafe, i.e. you might get either ID or a completely invalid ID.
This.data.lastID will give you last inserted ID.
I have a whole bunch of fields for each user in my redis database, and I want to be able to retrieve all their records and display them.
The way I do it, is store a set of all userids, When I want all their records, I recursively iterate the set grabbing their records using the userids in the set and adding them to a global array, then finally returning this global array. Anyway I don't particularly like this method and would like to hear some suggestions of alternatives, I feel there must be better functionality in node.js or redis for this very problem. Maybe there is a way to do away with using the set entirely, but looking around I couldn't see anything obvious.
This is an example of my psuedoish (pretty complete) node.js code, note the set size is not a problem as it will rarely be > 15.
Register Function:
var register = function(username, passwordhash, email){
// Get new ID by incrementing idcounter
redis.incr('db:users:idcounter', function(err, userid){
// Setup user hash with user information, using new userid as key
redis.hmset('db:user:'+userid, {
'username':username,
'passwordhash':passwordhash,
'email':email
},function(err, reply){
// Add userid to complete list of all users
redis.sadd('db:users:all', userid);
}
});
});
}
Records retrieval function:
var getRecords = function(fcallback){
// Grab a list of all the id's
redis.smembers('db:users:all', function(err, allusersids){
// Empty the returned (global) array
completeArray = [];
// Start the recursive function, on the allusersids Array.
recursive_getNextUserHash(allusersids, fcallback);
});
}
Recursive function used to retrieve individual records:
// Global complete Array (so recursive function has access)
var completeArray = [];
// recursive method for filling up our completeArray
var recursive_getNextUserHash = function(userArray, callback){
// If userArray==0 this means we have cycled entire list,
// call the callback, and pass it the completeArray which
// is now full of our usernames + emails
if(userArray.length==0){
callback.apply(this, [completeArray]);
return;
}
// If still more items, start by popping the next user
var userid = userArray.pop();
// grab this users information
redis.hvals('db:user:'+userid, function(err, fields){
// Add users information to global array
completeArray.push({username:fields[0],email:fields[2]});
// Now move on to the next user
recursive_getNextUserHash(userArray, callback);
});
}
Use would be something like this:
register('bob', 'ASDADSFASDSA', 'bob#example.com');
register('bill', 'DDDASDADSAD', 'bill#example.com');
getRecords(function(records){
for(var i=0;i<records.length;i++){
console.log("u:"+records[i]['username']+',#:'+records[i]['email']);
}
});
Summary: What is a good way to retrieve many fields of Hash's using node.js and redis? After writing this question, I started to wonder if this is just the way you do it in redis, you make many roundtrips, regardless if this is the case, there must be a way to avoid the horrible recursion!
Assuming you are using https://github.com/mranney/node_redis - have a look at Multi and Exec. You can send all of your commands in a single request and wait for all the responses at once. No need for recursion.
For anyone else having a similar question, here is the syntax I ended up using:
redis.smembers('db:users:all', function(err, reply){
var multi = redisClient.multi();
for(var i=0;i<reply.length;i++){
multi.hmget('db:user:'+reply[i], ['username', 'email']);
}
multi.exec(function(err, replies){
for(var j=0;j<replies.length;j++){
console.log("-->"+replies[j]);
}
});
});