How to get Vertica copy from stdin response in NodeJS? - node.js

I'm using Vertica Database 07.01.0100 and node.js v0.10.32. I'm using the vertica nodejs module by vanberger. I want to send a copy from stdin command, and that is working using this example: https://gist.github.com/soldair/5168249. Here's my code:
var loadStreamQuery = "COPY \""+input('table-name')+"\" FROM STDIN DELIMITER ',' skip 1 direct;"
var stream = through();
connection.copy(loadStreamQuery,function(transfer, success, fail){
stream.on('data',function(data){
log.info("loaddata: on data =>",data);
transfer(data);
});
stream.on('end',function(data){
log.info("loaddata: on end =>", data);
if(data) {
transfer(data);
}
success();
callback(null,{'result':{'status':'200','result':"Data was loaded successfully into Vertica"}});
});
stream.on('error',function(err){
fail();
log.error("loaddata: on error =>",err);
connection.disconnect();
});
stream.write(new Buffer(file));
stream.end();
}
);
But, if the data file has more columns than the target table, it doesn't say so. It just happily runs, copying nothing and then ends. When I look at the table, nothing has been loaded. If I do the same thing in dbvisualizer, it tells me that 0 rows were affected.
I would like to examine the status of the command, but I don't know how. Is there some other event that I need to listen for? Do I need to save the result of copy to a variable and listen there, like I do with query calls? I'm a nodejs noob, so if the answer is obvious, just let me know.
Thanks!

I don't really think it is a node.js thing as much as it is a Vertica thing.
You need to look for rejected rows. You can find some good examples in the docs here.
If you want to actually see the rows that reject, you can do this by using a COPY statement clause like REJECTED DATA AS table "loader_rejects". Alternatively you can send it to a file on the cluster. I'm not aware of a way to get rejected rows to a local file using STDIN.
If you don't care at all about the actual data, and just want to know how many rows loaded and rejected... you can use GET_NUM_REJECTED_ROWS() and GET_NUM_ACCEPTED_ROWS(). I think COPY will actually also return a result set with just the count of loaded rows, at least that is what I've noticed in the past.
So I guess as an example, if you want to see how many rows were accepted and rejected, you could do:
connection.query "SELECT GET_NUM_REJECTED_ROWS() AS REJECTED_ROWS, GET_NUMBER_ACCEPTED_ROWS() AS ACCEPTED_ROWS", (err, resultset) -> log.info( err, resultset.fields, resultset.rows, resultset.status )

Related

How to handle multiple database connections for 2 or 3 SELECT queries in AWS Lambda with nodejs?

The lambda's job is to see if a query returns any results and alert subscribers via an SNS topic. If no rows are return, all good, no action needed. This has to be done every 10 minutes.
For some reasons, I was told that we can't have any triggers added on the database, and no on prem environment is suitable to host a cron job
Here comes lambda.
This is what I have in the handler, inside a loop for each database.
sequelize.authenticate()
.then(() => {
for (let j = 0; j < database[i].rawQueries[j].length; j++) {
sequelize.query(database[i].rawQueries[j] => {
if (results[0].length > 0) {
let message = "Temporary message for testing purposes" // + query results
publishSns("Auto Query Alert", message)
}
}).catch(err => {
publishSns("Auto Query SQL Error", `The following query could not be executed: ${database[i].rawQueries[j])}\n${err}`)
})
}
})
.catch(err => {
publishSns("Auto Query DB Connection Error", `The following database could not be accessed: ${databases[i].database}\n${err}`)
})
.then(() => sequelize.close())
// sns publisher
function publishSns(subject, message) {
const params = {
Message: message,
Subject: subject,
TopicArn: process.env.SNStopic
}
SNS.publish(params).promise()
}
I have 3 separate database configurations, and for those few SELECT queries, I thought I could just loop through the connection instances inside a single lambda.
The process is asynchronous and it takes 9 to 12 seconds per invocation, which I assume is far far from optimal
The whole thing feels very very sub optimal but that's my current level :)
To make things worse, I now read that lambda and sequelize don't really play well together:
I am using sequelize because that's the only way I could get 3 connections to the database in the same invocation to work without issues. I tried mssql and tedious packages and wasn't able with either of them
It now feels like using an ORM is an overkill for this very simple task of a SELECT query, and I would really like to at least have the connections and their queries done asynchronously to save some execution time
I am looking into different ways to accomplish this and i went down the rabbit hole and I now have more questions than before! Generators? are they still useful? Observables with RxJs? Could this apply here? Async/Await or just Promises? Do I even need sequelize?
Any guidance/opinion/criticism would be very appreciated
I'm not familiar with sequelize.js but hope I can help. I don't know your level with RxJS and Observables but it's worth to try.
I think you could definitely use Observables and RxJS.
I would start with an interval() that will run the code every time you define.
You can then pipe the interval since it's an Observable, do the auth bit and do a map() to get an array of Observables (for each .query call, I am assuming all your calls, authenticate and query, are Promises so it's possible to transform them into Observables with from()). You can then use something like forkJoin() with the previous array to get a response after all calls are done.
In the .subscribe at the end, you would make the publishSns().
You can pipe a catchError() too and process errors.
The map() part might be not necessary and do it previously and have it stored in a variable since you don't depend on an authenticate value.
I'm certain my solution isn't the only one or the best but i think it would work.
Hope it helps and let me know if it works!

What is the most efficient way to keep writing a frequently changing JavaScript object to a file in NodeJS?

I have a JavaScript object with many different properties, and it might look something like this:
var myObj = {
prop1: "val1",
prop2: [...],
...
}
The values in this object keep updating very frequently (several times every second) and there could be thousands of them. New values could be added, existing ones could be changed or removed.
I want to have a file that always has the updated version of this object. The simple approach for doing this would be just writing the entire object to the file all over again after each time that it changes like so:
fs.writeFileSync("file.json", JSON.stringify(myObj));
This doesn't seem very efficient for big objects that need to be written very frequently. Is there a better way of doing this?
You should use a database. Something simple like sqlite3 would be a good option. Have a table with just two columns 'Key' 'Value' and use it as a key value store. You will gain advantages like transactions and better performance than a file as well as simplifying your access.
Maintaining a file (on the filesystem) containing the current state of a rapidly changing object is surprisingly difficult. Specifically, setting things up so some other program can read the file at any time is the hard part. Why? At any time the file may be in the process of being written, so the reader can get inconsistent results.
Here's an outline of a good way to do this.
1) write the file less often than each time the state changes. Whenever the state changes call updateFile (myObj). It sets a timer for, let's say, 500ms, then writes the very latest state to the file when the timer expires. Something like this: not debugged:
let latestObj
let updateFileTimer = 0
function updateFile (myObj) {
latestObj = myObj
if (updateFileTimer === 0) {
updateFileTimer = setTimeout (
function () {
/* write latestObj to the file */
updateFileTimer = 0
}, 500)
}
}
This writes the latest state of your object to the file, but no more than every 500ms.
Inside that timeout function, write out a temporary file. When it's written delete the existing file and rename the temp file to have the existing file's name. Do all this asynchronously so the rest of your program won't have to wait for the filesystem to work. Your timeout function will look like this
updateFileTimer = setTimeout (
function () {
/* write latestObj to the file */
fs.writeFile("file.json.tmp",
JSON.stringify(myObj),
function (err) {
if (err) throw err;
fs.unlink ( "file.json",
function (err) {
if (!err)
fs.renameSync( "file.json.tmp", "file.json")
} )
} )
updateFileTimer = 0
}, 500)
There's one more thing to worry about. There's a brief period of time between the unlink and the renameSync operation where the "file.json" file does not exist in the file system. So, any program you write that READs "file.json" needs to try again if the file isn't found.
If you use a Linux, MacOs, FreeBSD, or other UNIX-derived operating system for this code it will work well. Those operating systems' file systems allow one program to unlink a file while another program is reading it. If you're running it on a DOS-derived operating system like Windows, the unlink operation will fail when another program is reading the file.

node's fs.writeFile does not overwrite previous contents

I have a file which sometimes gets \00 null characters inside it. So I need to repair it.
Thats why I'm reading it, removing the invalid characters and writing it again. BUT, fs.writeFile is not overwriting its previous contents. The new contents get appended, which is not what i want.
Is is because my write code is inside read code?
fs.readFile('./' + file, function (err, data) {
if (err) {
console.error(err);
return;
}
var str = data.toString();
var repaired = str.slice(0, str.indexOf('\00')) + str.slice(str.lastIndexOf('\00') + 1, str.length);
//console.log(repaired);
fs.writeFile('./' + file, repaired, function (err) {
if (err)
console.error(err);
});
});
I've also tried using {flag:'w'} (which i think fs.writeFile may already have by default)
Thanks to #thefourtheye for pointing me towards me to proper direction.
As there was no \00 character in the file i was testing with, the str.indexOf('\00') and was getting the whole file, and again str.slice(str.lastIndexOf('\00') was getting the whole file. Thats why I thought
Using replace function did the job.
var repaired = str.replace(/\00/g,'');
I had the same or similar problem in that it seems when I called fs.writeFile() with different content but the same file, if the new content was shorter than the existing content then it did not overwrite all of the previous file-content.
I found an explanation why this may be happening and a suggested remedy at:
https://github.com/nodejs/node-v0.x-archive/issues/4965.
According to that "This is not a bug. (Or at least, not one that Node has ever pretended to address...."
The suggested solution is "wait for the callback". I assume that means "wait for the write-callback to be called before trying to read the file". That makes sense of course, you should not try to read what may not have been fully written yet.
But, if you write to the same file several times like I did, then waiting for the (first) write-callback to complete before reading is not enough. Why? Because another 'write' may be in progress when you do the reading, and thus you can get garbled content.

Mongoose js batch find

I'm using mongoose 3.8. I need to fetch 100 documents, execute the callback function then fetch next 100 documents and do the same thing.
I thought .batchSize() would do the same thing, but I'm getting all the data at once.
Do I have to use limit or offset? If yes, can someone give a proper example to do it?
If it can be done with batchSize, why is it not working for me?
MySchema.find({}).batchSize(20).exec(function(err,docs)
{
console.log(docs.length)
});
I thought it would print 20 each time, but its printing whole count.
This link has the information you need.
You can do this,
var pagesize=100;
MySchema.find().skip(pagesize*(n-1)).limit(pagesize);
where n is the parameter you receive in the request, which is the page number client wants to receive.
Docs says:
In most cases, modifying the batch size will not affect the user or the application, as the mongo shell and most drivers return results as if MongoDB returned a single batch.
You may want to take a look at streams and perhaps try to accumulate subresults:
var stream = Dummy.find({}).stream();
stream.on('data', function (dummy) {
callback(dummy);
})

Observe file changes with node.js

I have the following UseCase:
A creates a Chat and invites B and C - On the Server A creates a
File. A, B and C writes messages into this file. A, B and C read this
file.
I want a to create a file on server and observe this file if anybody else writes something into this file send the new content back with websockets.
So, any change of this file should be observed by my node.js application.
How can I observe files-changes? Is this possible with node js without locking the files?
If not possible with files, would it be possible with database object (NoSQL)
Good news is that you can observe filechanges with Node's API.
This however doesn't give you access to the contents that has been written into the file.
You can maybe use the fs.appendFile(); function so that when something is being written into the file you emit an event to something else that "logs" your new data that is being written.
fs.watch(): Directly pasted from the docs
fs.watch('somedir', function (event, filename) {
console.log('event is: ' + event);
if (filename) {
console.log('filename provided: ' + filename);
} else {
console.log('filename not provided');
}
});
Read here about the fs.watch(); function
EDIT: You can use the function
fs.watchFile();
Read here about the fs.watchFile(); function
This will allow you to watch a file for changes. Ie. whenever it is accessed by some other processes of any kind.
Also you could use node-watch. Here's an easy example:
const watch = require('node-watch')
watch('README.md', function(event, filename) {
console.log(filename, ' changed.')
})
I do not think you need to have observe file changes or use a NoSQL database for this (if you do not want to). My advice would be to look at events(Observer pattern). There are more than enough tutorials on this topic available online (Google). For example Felix's article about Using EventEmitters
This publish/subcribe semantic can also be achieved with NoSQL. In Redis for example, I think you should have a look at pubsub.
In MongoDB I think tailable cursors is what you are looking for. On their blog they have a post explaining pub/sub.

Resources