How to handle sync browser emulations in node.js - node.js

I'm writing a script that is intended to load some stuff from .txt files and then perform multiple ( in a loop) requests to a website with node.js` browser emulator nightmare.
I have no problem with reading from the txt files and so no, but managing to make it run sync and without exceptions.
function visitPage(url, code) {
new Promise((resolve, reject) => {
Nightmare
.goto(url)
.click('.vote')
.insert('input[name=username]', 'testadmin')
.insert('.test-code-verify', code)
.click('.button.vote.submit')
.wait('.tag.vote.disabled,.validation-error')
.evaluate(() => document.querySelector('.validation -error').innerHTML)
.end()
.then(text => {
return text;
})
});
}
async function myBackEndLogic() {
try {
var br = 0, user, proxy, current, agent;
while(br < loops){
current = Math.floor(Math.random() * (maxLoops-br-1));
/*...getting user and so on..*/
const response = await visitPage('https://example.com/admin/login',"code")
br++;
}
} catch (error) {
console.error('ERROR:');
console.error(error);
}
}
myBackEndLogic();
The error that occurs is:
UnhandledPromiseRejectionWarning: TypeError: Cannot read property 'webContents' of undefined
So the questions are a few:
1) How to fix the exception
2) How to make it actually work sync and emulate everytime the address ( as in a previous attempt, which I didn't save, I fixed the exception, but the browser wasn't actually openning and it was basically skipped
3) (Not so important) Is it possible to select a few objects with
.wait('.class1,.class2,.validation-error')
and save each value in different variables or just get the text from the first that occured? ( if no any of these has occurred, then return 0 for example )

I see a few issues with the code above.
In the visitPage function, you are returning a Promise. That's fine, except you don't have to create the wrapping promise! It looks like nightmare returns a promise for you. Today, you're dropping an errors that promise returns by wrapping it. Instead - just use an async function!
async function visitPage(url, code) {
return Nightmare
.goto(url)
.click('.vote')
.insert('input[name=username]', 'testadmin')
.insert('.test-code-verify', code)
.click('.button.vote.submit')
.wait('.tag.vote.disabled,.validation-error')
.evaluate(() => document.querySelector('.validation -error').innerHTML)
.end();
}
You probably don't want to wrap the content of this method in a 'try/catch'. Just let the promises flow :)
async function myBackEndLogic() {
var br = 0, user, proxy, current, agent;
while(br < loops){
current = Math.floor(Math.random() * (maxLoops-br-1));
const response = await visitPage('https://example.com/admin/login',"code")
br++;
}
}
When you run your method - make sure to include a catch! Or a then! Otherwise, your app may exit early.
myBackEndLogic()
.then(() => console.log('donesies!'))
.catch(console.error);
I'm not sure if any of this will help with your specific issue, but hopefully it gets you on the right path :)

Related

Problem in reading event stream in AWS Lambda. Nodejs code working locally as desired but not in AWS Lambda

Here's the workflow:
Get a https link --> write to filesystem --> read from filesystem --> Get the sha256 hash.
It works all good on my local machine running node 10.15.3 But when i initiate a lambda function on AWS, the output is null. Some problem may lie with the readable stream. Here's the code. You can run it directly on your local machine. It will output a sha256 hash as required. If you wish to run on AWS Lambda, Comment/Uncomment as marked.
//Reference: https://stackoverflow.com/questions/11944932/how-to-download-a-file-with-node-js-without-using-third-party-libraries
var https = require('https');
var fs = require('fs');
var crypto = require('crypto')
const url = "https://upload.wikimedia.org/wikipedia/commons/a/a8/TEIDE.JPG"
const dest = "/tmp/doc";
let hexData;
async function writeit(){
var file = fs.createWriteStream(dest);
return new Promise((resolve, reject) => {
var responseSent = false;
https.get(url, response => {
response.pipe(file);
file.on('finish', () =>{
file.close(() => {
if(responseSent) return;
responseSent = true;
resolve();
});
});
}).on('error', err => {
if(responseSent) return;
responseSent = true;
reject(err);
});
});
}
const readit = async () => {
await writeit();
var readandhex = fs.createReadStream(dest).pipe(crypto.createHash('sha256').setEncoding('hex'))
try {
readandhex.on('finish', function () { //MAY BE PROBLEM IS HERE.
console.log(this.read())
fs.unlink(dest, () => {});
})
}
catch (err) {
console.log(err);
return err;
}
}
const handler = async() =>{ //Comment this line to run the code on AWS Lambda
//exports.handler = async (event) => { //UNComment this line to run the code on AWS Lambda
try {
hexData = readit();
}
catch (err) {
console.log(err);
return err;
}
return hexData;
};
handler() //Comment this line to run the code on AWS Lambda
There can be multiple things that you need check.
Since, the URL you are accessing is a public one, make sure either your lambda is outside VPC or your VPC has NAT Gateway attached with internet access.
/tmp is valid temp directory for lambda, but you may need to create doc folder inside /tmp before using it.
You can check cloud-watch logs for more information on what's going if enabled.
I've seen this difference in behaviour between local and lambda before.
All async functions return promises. Async functions must be awaited. Calling an async function without awaiting it means execution continues to the next line(s), and potentially out of the calling function.
So your code:
exports.handler = async (event) => {
try {
hexData = readit();
}
catch (err) {
console.log(err);
return err;
}
return hexData;
};
readit() is defined as const readit = async () => { ... }. But your handler does not await it. Therefore hexData = readit(); assigns an unresolved promise to hexData, returns it, and the handler exits and the Lambda "completes" without the code of readit() having been executed.
The simple fix then is to await the async function: hexData = await readit();. The reason why it works locally in node is because the node process will wait for promises to resolve before exiting, even though the handler function has already returned. But since Lambda "returns" as soon as the handler returns, unresolved promises remain unresolved. (As an aside, there is no need for the writeit function to be marked async, because it doesn't await anything, and already returns a promise.)
That being said, I don't know promises well, and I barely know anything about events. So there are others things which raise warning flags for me but I'm not sure about them, maybe they're perfectly fine, but I'll raise it here just in case:
file.on('finish' and readandhex.on('finish'. These are both events, and I believe are non-blocking, so why would the handler and therefore lambda wait around for them?
In the first case, it's within a promise and resolve() is called from within the event function, so that may be fine (as I said, I don't know much about these 2 subjects so am not sure) - the important thing is that the code must block at that point until the promise is resolved. If the code can continue execution (i.e. return from writeit()) until the finish event is raised, then it won't work.
The second case is almost certainly going to be a problem because it's just saying that if x event is raised, then do y. There's no promise being awaited, so nothing to block the code, so it will happily continue to the end of the readit() function and then the handler and lambda. Again this is based on the assumption that events are non blocking (in the sense of, a declaration that you want to execute some code on some event, does not wait at that point for that event to be raised).

converting promiseAll to gradual promises resolve(every 3promises for example) does not work

I have a list of promises and currently I am using promiseAll to resolve them
Here is my code for now:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id, authorization, pageNumber);
if (!urlObject.url) {
// throw error
}
const data = await rp.get({
gzip: true,
headers: {
"Accept-Encoding": "gzip,deflate",
},
json: true,
uri: `${urlObject.url}`,
})
const objects = data.objects.filter((object: any) => object.type === "observed-data" && object.created);
return new Promise((resolve, reject) => {
this._resultsDatastore.bulkInsert(
databaseName,
objects
).then(succ => {
resolve(succ)
}, err => {
reject(err)
})
})
})
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
So as you see here I use promise all and it works:
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e)
})
However I noticed it affects the database performance wise so I decided to resolve every 3 of them at a time.
for that I was thinking of different ways like cwait, async pool or wrting my own iterator
but I get confused on how to do that?
For example when I use cwait:
let promiseQueue = new TaskQueue(Promise,3);
const all=new Promise.map(pageFutures, promiseQueue.wrap(()=>{}));
I do not know what to pass inside the wrap so I pass ()=>{} for now plus I get
Property 'map' does not exist on type 'PromiseConstructor'.
So whatever way I can get it working(my own iterator or any library) I am ok with as far as I have a good understanding of it.
I appreciate if anyone can shed light on that and help me to get out of this confusion?
First some remarks:
Indeed, in your current setup, the database may have to process several bulk inserts concurrently. But that concurrency is not caused by using Promise.all. Even if you had left out Promise.all from your code, it would still have that behaviour. That is because the promises were already created, and so the database requests will be executed any way.
Not related to your issue, but don't use the promise constructor antipattern: there is no need to create a promise with new Promise when you already have a promise in your hands: bulkInsert() returns a promise, so return that one.
As your concern is about the database load, I would limit the work initiated by the pageFutures promises to the non-database aspects: they don't have to wait for eachother's resolution, so that code can stay like it was.
Let those promises resolve with what you currently store in objects: the data you want to have inserted. Then concatenate all those arrays together to one big array, and feed that to one database bulkInsert() call.
Here is how that could look:
const pageFutures = myQuery.pages.map(async (pageNumber: number) => {
const urlObject: any = await this._service.getResultURL(searchRecord.details.id,
authorization, pageNumber);
if (!urlObject.url) { // throw error }
const data = await rp.get({
gzip: true,
headers: { "Accept-Encoding": "gzip,deflate" },
json: true,
uri: `${urlObject.url}`,
});
// Return here, don't access the database yet...
return data.objects.filter((object: any) => object.type === "observed-data"
&& object.created);
});
const all: any = await Promise.all(pageFutures).catch(e => {
console.log(e);
return []; // in case of error, still return an array
}).flat(); // flatten it, so all data chunks are concatenated in one long array
// Don't create a new Promise with `new`, only to wrap an other promise.
// It is an antipattern. Use the promise returned by `bulkInsert`
return this._resultsDatastore.bulkInsert(databaseName, objects);
This uses .flat() which is rather new. In case you have no support for it, look at the alternatives provided on mdn.
First, you asked a question about a failing solution attempt. That is called X/Y problem.
So in fact, as I understand your question, you want to delay some DB request.
You don't want to delay the resolving of a Promise created by a DB request... Like No! Don't try that! The promise wil resolve when the DB will return a result. It's a bad idea to interfere with that process.
I banged my head a while with the library you tried... But I could not do anything to solve your issue with it. So I came with the idea of just looping the data and setting some timeouts.
I made a runnable demo here: Delaying DB request in small batch
Here is the code. Notice that I simulated some data and a DB request. You will have to adapt it. You also will have to adjust the timeout delay. A full second certainly is too long.
// That part is to simulate some data you would like to save.
// Let's make it a random amount for fun.
let howMuch = Math.ceil(Math.random()*20)
// A fake data array...
let someData = []
for(let i=0; i<howMuch; i++){
someData.push("Data #"+i)
}
console.log("Some feak data")
console.log(someData)
console.log("")
// So we have some data that look real. (lol)
// We want to save it by small group
// And that is to simulate your DB request.
let saveToDB = (data, dataIterator) => {
console.log("Requesting DB...")
return new Promise(function(resolve, reject) {
resolve("Request #"+dataIterator+" complete.");
})
}
// Ok, we have everything. Let's proceed!
let batchSize = 3 // The amount of request to do at once.
let delay = 1000 // The delay between each batch.
// Loop through all the data you have.
for(let i=0;i<someData.length;i++){
if(i%batchSize == 0){
console.log("Splitting in batch...")
// Process a batch on one timeout.
let timeout = setTimeout(() => {
// An empty line to clarify the console.
console.log("")
// Grouping the request by the "batchSize" or less if we're almost done.
for(let j=0;j<batchSize;j++){
// If there still is data to process.
if(i+j < someData.length){
// Your real database request goes here.
saveToDB(someData[i+j], i+j).then(result=>{
console.log(result)
// Do something with the result.
// ...
})
} // END if there is still data.
} // END sending requests for that batch.
},delay*i) // Timeout delay.
} // END splitting in batch.
} // END for each data.

Trouble with Promise and Async functions

I am trying to use promises and async functions to send chunks of an array to an api call that inserts them into a DB. I am trying to get the top function to chunk the array and await for the backend to finsih then it will move on to the next chunk. It errors out after the first iteration. Any ideas??
async chunkArray(arr) {
let len = arr.length
let update_arr
let i = 1
for(i; i<=len; i++) {
if((i%125) === 0) {
update_arr = arr.slice(i-125,i)
await this.updateChuckArray(update_arr)
} else if(i === len) {
let div = (Math.floor(len/125) * 125)
update_arr = arr.slice(div, len)
await this.updateChuckArray(update_arr)
}
}
},
updateChuckArray(update) {
return new Promise(resolve => {
this.$http.put(`/route`, update).then(res => {
res.data.error ? this.$root.updateError(res.data.error) : this.$root.updateSuccess(res.data.message)
}).catch(error => {
this.$root.updateError(res.data.error)
})
})
}
First off your updateChuckArray() never resolves the promise it returns (you never call resolve()).
Instead of manually wrapping a new promise around your function call (that is a promise anti-pattern), you can just return the promise you already have and write it like this:
updateChuckArray(update) {
return this.$http.put(`/route`, update).then(res => {
res.data.error ? this.$root.updateError(res.data.error) : this.$root.updateSuccess(res.data.message);
}).catch(error => {
this.$root.updateError(error);
})
}
FYI, it's unclear what your error handling strategy is. The way you wrote the code (which is followed above), you catch an error from this.$http.put() and handle it and let the loop continue. If that's what you want, this will work. If you want the for loop to abort on error, then you need to rethrow the error in the .catch() handler so the error gets back to the await.
Also, not that in your .catch() handler, you were doing this:
this.$root.updateError(res.data.error)
but there is no res defined there. The error is in error. You would need to use that in order to report the error. I'm not sure what the structure of the error object is here or what exactly you pass to $.root.updateError(), but it must be something that comes from the error object, not an object named res.

in firebase cloud function the bucket.upload promise resolves too early

I wrote a function that work like this
onNewZipFileRequested
{get all the necessary data}
.then{download all the files}
.then{create a zipfile with all those file}
.then{upload that zipfile} (*here is the problem)
.than{update the database with the signedUrl of the file}
Here is the relevant code
[***CREATION OF ZIP FILE WORKING****]
}).then(() =>{
zip.generateNodeStream({type:'nodebuffer',streamFiles:true})
.pipe(fs.createWriteStream(tempPath))
.on('finish', function () {
console.log("zip written.");
return bucket.upload(tempPath, { //**** problem****
destination: destinazionePath
});
});
}).then(()=>{
const config = {
action:'read',
expires:'03-09-2391'
}
return bucket.file(destinazionePath).getSignedUrl(config)
}).then(risultato=>{
const daSalvare ={
signedUrl: risultato[0],
status : 'fatto',
dataInserimento : zipball.dataInserimento
}
return event.data.ref.set(daSalvare)
})
On the client side, as soon as the app see the status change and the new Url, a download button (pointing to the new url) appears
Everything is working, but if I try to download the file immediately... there is no file yet!!!
If I wait same time and retry the file is there.
I noted that the time I have to wait depend on the size of the zipfile.
The bucket.upload promise should resolve on the end of the upload, but apparently fires too early.
Is there a way to know exactly when the file is ready?
I may have to make same very big file, it's not a problem if the process takes several minutes, but I need to know when it's over.
* EDIT *
there was a unnecessary nesting in the code. While it was not the error (results are the same before and after refactoring) it was causing some confusion in the answers, so i edited it out.
Id' like to point out that i update the database only after getting the signed url, and i get that only after the upload (i could not otherwise), so to get any result at all the promise chain MUST work, and in fact it does. When on the client side the download button appears (happens when 'status' become 'fatto') it is already linked to the correct signed url, but if i press it too early the file is not there (Failed - No file). If i wait some second (the bigger the file the longer i have to wait) then the file is there.
(English is not my mother language, if i have been unclear ask and i will try to explain myself better)
It looks like the problem could be that the braces are not aligned properly, causing a then statement to be embedded within another. Here is the code with the then statements separated:
[***CREATION OF ZIP FILE WORKING****]}).then(() => {
zip.generateNodeStream({type: 'nodebuffer', streamFiles: true})
.pipe(fs.createWriteStream(tempPath))
.on('finish', function () {
console.log('zip written.')
return bucket.upload(tempPath, {
destination: destinazionePath
})
})
}).then(() => {
const config = {
action: 'read',
expires: '03-09-2391'
}
return bucket.file(destinazionePath).getSignedUrl(config)
}).then(risultato => {
const daSalvare = {
signedUrl: risultato[0],
status : 'fatto',
dataInserimento : zipball.dataInserimento
}
return event.data.ref.set(daSalvare)
})

Node promises: use first/any result (Q library)

I understand using the Q library it's easy to wait on a number of promises to complete, and then work with the list of values corresponding to those promise results:
Q.all([
promise1,
promise2,
.
.
.
promiseN,
]).then(results) {
// results is a list of all the values from 1 to n
});
What happens, however, if I am only interested in the single, fastest-to-complete result? To give a use case: Say I am interested in examining a big list of files, and I'm content as soon as I find ANY file which contains the word "zimbabwe".
I can do it like this:
Q.all(fileNames.map(function(fileName) {
return readFilePromise(fileName).then(function(fileContents) {
return fileContents.contains('zimbabwe') ? fileContents : null;
}));
})).then(function(results) {
var zimbabweFile = results.filter(function(r) { return r !== null; })[0];
});
But I need to finish processing every file even if I've already found "zimbabwe". If I have a 2kb file containing "zimbabwe", and a 30tb file not containing "zimbabwe" (and suppose I'm reading files asynchronously) - that's dumb!
What I want to be able to do is get a value the moment any promise is satisfied:
Q.any(fileNames.map(function(fileName) {
return readFilePromise(fileName).then(function(fileContents) {
if (fileContents.contains('zimbabwe')) return fileContents;
/*
Indicate failure
-Return "null" or "undefined"?
-Throw error?
*/
}));
})).then(function(result) {
// Only one result!
var zimbabweFile = result;
}).fail(function() { /* no "zimbabwe" found */ });
With this approach I won't be waiting on my 30tb file if "zimbabwe" is discovered in my 2kb file early on.
But there is no such thing as Q.any!
My question: How do I get this behaviour?
Important note: This should return without errors even if an error occurs in one of the inner promises.
Note: I know I could hack Q.all by throwing an error when I find the 1st valid value, but I'd prefer to avoid this.
Note: I know that Q.any-like behavior could be incorrect, or inappropriate in many cases. Please trust that I have a valid use-case!
You are mixing two separate issues: racing, and cancelling.
Racing is easy, either using Promise.race, or the equivalent in your favorite promise library. If you prefer, you could write it yourself in about two lines:
function race(promises) {
return new Promise((resolve, reject) =>
promises.forEach(promise => promise.then(resolve, reject)));
}
That will reject if any promise rejects. If instead you want to skip rejects, and only reject if all promises reject, then
function race(promises) {
let rejected = 0;
return new Promise((resolve, reject) =>
promises.forEach(promise => promise.then(resolve,
() => { if (++rejected === promises.length) reject(); }
);
}
Or, you could use the promise inversion trick with Promise.all, which I won't go into here.
Your real problem is different--you apparently want to "cancel" the other promises when some other one resolves. For that, you will need additional, specialized machinery. The object that represents each segment of processing will need some way to ask it to terminate. Here's some pseudo-code:
class Processor {
promise() { ... }
terminate() { ... }
}
Now you can write your version of race as
function race(processors) {
let rejected = 0;
return new Promise((resolve, reject) =>
processors.forEach(processor => processor.promise().then(
() => {
resolve();
processors.forEach(processor => processor.terminate());
},
() => { if (++rejected === processors.length) reject(); }
);
);
}
There are various proposals to handle promise cancellation which might make this easier when they are implemented in a few years.

Resources