Does PhantomJS store any persistent files by default? - node.js

As a new user to PhantomJS I want to be sure that I understand how PhantomJS handles any persistence of data that it accumulates from a HTTP request.
My question is: Does PhantomJS store any data persistently by default (i.e. a simple example where you are not using require('fs') anywhere in the script to store the request, just dumping it out to STDOUT). I am assuming that all of the work from a page.evaluate() call is done in memory.
Here is a simple example for illustration:
var page = require('webpage').create(),
system = require('system'),
address;
if(system.args.length != 2)
{
console.log('Usage: phantomjs thisFile.js URL');
phantom.exit(1);
}
else
{
address = system.args[1];
page.open(address, function (status)
{
if(status !== 'success')
{
console.log('Unable to load the address!');
phantom.exit(1);
}
else
{
// Wait for the js to finish loading
window.setTimeout(function(){
var results = page.evaluate(function(){
return document.documentElement.innerHTML;
});
console.log(results); // This would be to stdout
phantom.exit(0);
}, 200);
}
console.log("Done.");
});
}
This script would be called by something like phantomjs thisScript.js www.example.com.
I know that you can save a page to file, I just want to be sure that I am aware of all the places that PhantomJS may accumulate data on its own.
I also have looked into how PhantomJS handles cookies.

Yes, there is one type that is saved by default and this is the localStorage database.
On Windows 7: C:\Users\<user>\AppData\Local\Ofi Labs\PhantomJS
On Windows 8: C:\Ofi Labs\PhantomJs
On Linux: /home/<user>/.qws/share/data/Ofi Labs/PhantomJS
Everything else is saved only when you add the commandline options. The disk cache is inside of the above directory and the cookie file path has to be set explicitly.
So it means that if the web application that you test doesn't use localStorage then you can run PhantomJS in parallel.

Related

How can I write a buffer data to a file from readable.stream in Nodejs?

How can I write a buffer data to a file from readable.stream in Nodejs? I know there are already npm package, I am asking this question for learning purpose only. I am also wondering why there is no such method available in npm 'fs' where user can pass readablestream and create a file directly?
I tried to write a stream.readableBuffer to a file using fs.write by passing the buffer directly, but somehow a small portion of file, is corrupt, after writing, I can see image but a small portion look black in it, my guess buffer has not written completely.
I pass formdata from ajax XMLHttpRequest to serverside controller (node js router in this case).
and I used npm 'parse-formdata' to parse the request. below is the code:
parseFormdata(req, function (err, data) {
if (err) {
logger.error(err);
throw err
}
console.log('fields:', data.fields); // I have data here but how to write this data to a file?
/** perhaps a bad way to write the data to a file, looking for a better way **/
var chunk = data.parts[0].stream.readableBuffer.head.chunk;
fs.writeFile(data.parts[0].filename, chunk, function(err) {
if(err) {
console.log(err);
} else {
console.log("The file was saved!");
}
});
could some body tell me a better approach to write the data (that I got from parsing of FormData) ?
According to parseFormData
You may use the provided sample:
var pump = require('pump')
var concat = require('concat-stream')
pump(stream, concat(function (buf) {
assert.equal(String(buf), String(file), 'file received')
// then write to your file
res.end()
}))
But you may do shorter:
const ws = fs.createWriteStream('out.txt')
data.parts[0].stream.pipe(ws)
Finally, note that library has not been updated since 2017, so there may be some vulnerabilities or so..

Scraping dynamic data of a web page in nodejs

By using node.js I am trying to scrape a web page. For this, I am using cheerio and tinyreq modules. My source code is as follows:
// scrape function
function scrape(url, data, cb) {
req(url, (err, body) => {
if (err) { return cb(err); }
let $ = cheerio.load(body)
, pageData = {};
Object.keys(data).forEach(k => {
pageData[k] = $(data[k]).text();
});
cb(null, pageData);
});
}
scrape("https://www.activecubs.com/activity-wheel/", {
title: ".row h1"
, description: ".row h2"
}, (err, data) => {
console.log(err || data);
});
In my code, the text in the h1 tag is static and in the h2 tag, it is dynamic. While I run the code, I am only getting the static data i.e., the description field data is empty.By following previous StackOverflow questions, I tried using phantom js to overcome this issue but it doesn't work for me. The dynamic data here is the data which is obtained by rotating a wheel. For any doubts on the website I am using, you can check https://www.activecubs.com/activity-wheel/.
Cheerio documentation is pretty clear
https://github.com/cheeriojs/cheerio#cheerio-is-not-a-web-browser
see also https://github.com/segmentio/nightmare
User action can be performed using SpookyJS
SpookyJS makes it possible to drive CasperJS suites from Node.js. At a high level, Spooky accomplishes this by spawning Casper as a child process and controlling it via RPC.
Specifically, each Spooky instance spawns a child Casper process that runs a bootstrap script. The bootstrap script sets up a JSON-RPC server that listens for commands from the parent Spooky instance over a transport (either HTTP or stdio). The script also sets up a JSON-RPC client that sends events to the parent Spooky instance via stdout. Check the documentation
Example

NodeJS readdir() function always being run twice

I've been trying to pick up NodeJS and learning more for backend development purposes. I can't seem to wrap my mind around Async tasks though and I have an example here that I've spent hours over trying to search for the solution.
app.get('/initialize_all_pictures', function(req, res){
var path = './images/';
fs.readdir(path, function(err, items){
if (err){
console.log("there was an error");
return;
}
console.log(items.length);
for(var i = 0; i<items.length; i++){
var photo = new Photo(path + items[i], 0, 0,Math.floor(Math.random()*1000))
photoArray.push(photo);
}
});
res.json({"Success" : "Done"});
});
Currently, I have this endpoint that is supposed to look through a directory called images and create "Photo" objects and push it into a global array called PhotoArray. It works, except the function for readdir is always being called twice.
console.log would always give output of
2
2
(I have two items in the directory).
Why is this?
Just figured out the problem.
I had a chrome extension that would help me format JSON values from HTTP requests. Unfortunately, the extension actually made an additional call to the endpoint therefore whenever I would point my browser to the endpoint, the function would end up getting called twice!

Can a node.js server know if a server file is created?

which is the most elegant way or technology to let a node.js server know if a file is created on a server?
The idea is: a new image has been created (from a webcam or so) -> dispatch an event!
UPDATE: The name of the new file in the directory is not known a priori and the file is generated by an external software.
You should take a look at fs.watch(). It allows you to "watch" a file or directory and receive events when things change.
Note: The documentation states that fs.watch is not consistent across platforms, so you should take that in to account before using it.
fs.watch(fileOrDirectoryPath, function(event, filename) {
// Something changed with filename, trigger event appropriately
});
Also something to be aware of from the docs:
Providing filename argument in the callback is not supported on every
platform (currently it's only supported on Linux and Windows). Even on
supported platforms filename is not always guaranteed to be provided.
Therefore, don't assume that filename argument is always provided in
the callback, and have some fallback logic if it is null.
If filename is not available on your platform and you're watching a directory you may need to do something where you initially read the directory and cache the list of files in it. Then, when you get an event from fs.watch, read the directory again and compare it to the cached list of files to see what was added (if anything).
Update 1: There's a good module called watch, on github, which makes it easy to watch a directory for new files.
Update 2: I threw together an example of how to use fs.watch to get notified when new files are added to a directory. I think the module I linked to above is probably the better way to go, but I thought it would be nice to have a basic example of how it might work if you were to do it yourself.
Note: This is a fairly simplistic example just to show how it could work in general. It could almost certainly be done more efficiently and it's far from throughly tested.
function watchForNewFiles(directory, callback) {
// Get a list of all the files in the directory
fs.readdir(directory, function(err, files) {
if (err) {
callback(err);
} else {
var originalFiles = files;
// Start watching the directory for new events
var watcher = fs.watch(directory, function(event, filename) {
// Get the updated list of all the files in the directory
fs.readdir(directory, function(err, files) {
if (err) {
callback(err);
} else {
// Filter out any files we already knew about
var newFiles = files.filter(function(f) {
return (originalFiles.indexOf(f) < 0);
});
// Reset our list of "original" files
originalFiles = files;
// If there are new files detected, call the callback
if (newFiles.length) {
callback(null, newFiles);
}
}
})
});
}
});
}
Then, to watch a directory you'd call it with:
watchForNewFiles(someDirectoryPath, function(err, files) {
if (err) {
// handle error
} else {
// handle any newly added files
// "files" is an array of filenames that have been added to the directory
}
});
I came up with my own solution using this code here:
var fs = require('fs');
var intID = setInterval(check,1000);
function check() {
fs.exists('file.txt', function check(exists) {
if (exists) {
console.log("Created!");
clearInterval(intID);
}
});
}
You could add a parameter to the check function with the name of the file and call it in the path.
I did some tests on fs.watch() and it does not work if the file is not created. fs.watch() has multiple issues anyways and I would never suggest using it... It does work to check if the file was deleted though...

selenium-webdriver npm wait

I have a simple script that performs a login using the selenium-webdriver npm module. The script works, but it is really slow and the wait timeout is giving very odd results (sometimes it seems to timeout immediately, and other times it waits far past the defined timeout).
Am I doing something wrong that would make the login very slow (running this through a selenium hub perhaps)? The site itself is very responsive.
Here is the script:
var webdriver = require('selenium-webdriver');
var driver = new webdriver.Builder().
usingServer('http://hubserver:4444/wd/hub').
withCapabilities(webdriver.Capabilities.firefox()).
build();
console.log('\n\nStarting login.');
console.log('\nConnecting to grid: http://hubserver:4444/wd/hub' );
// Load the login page and wait for the form to display
driver.get('https://testsite.com');
driver.wait(function() {
return driver.isElementPresent(webdriver.By.name('user'));
}, 3000, '\nFailed to load login page.');
// Enter the user name
driver.findElement(webdriver.By.name('user')).sendKeys('testuser').then(function() {
console.log("\nEntering user name");
});
// Enter the password
driver.findElement(webdriver.By.name('pass')).sendKeys('testpwd').then(function() {
console.log("\nEntering password");
});
// Click the login button
driver.findElement(webdriver.By.id('submit')).click().then(function() {
console.log("\nLogging in.");
});
// Wait for the home page to load
driver.wait(function() {
console.log("\nWaiting for page to load");
return driver.isElementPresent(webdriver.By.id('main'));
}, 3000, '\nFailed to load home page.');
driver.getCurrentUrl().then(function(url) {
console.log("\nPage loaded: " + url);
});
driver.quit();
Maybe you have it specified elsewhere, but in the code shown your driver.wait() has no amount of time specified.
Also, maybe I'm misunderstanding your code because I do this mainly in Python, but driver.wait(function(){}); looks weird to me. Is this really proper use of the JS bindings? Generally, you wait for the element to be found and then subsequently call a function that does something with the element. I can't write it in JS, but in pseudocode:
driver.wait(#element you're looking for)
# Handle exception if there is one
# Otherwise do something with element you're looking for
Also, I would think
driver.isElementPresent(webdriver.By.name('user'));
Should be
driver.isElementPresent(By.name('user'));

Resources