NodeJS packages to handle parallel headless tests on linux box(es) with selenium grid like features? - node.js

I need to handle authenticated multiple users running parallel tests on the selenium standalone server, and discovered two webdriver clients on nodejs. There's webdriver-js and wd-js. Which is more active and reliable? Any experiences? I'm a bit concerned about them breaking down when node or selenium updates or removes features.
I don't think any of those packages mention automatically starting Xvfb on a unique display number per test. So start shell commands to run xvfb before driving the browser?
The following process is what I am trying to build in nodejs (it's essentially like Grid 2 but on nodejs purpose of continuous integration of tests running) and looking for any packages or suggestions for any of the following part.
First authenticate the user(s) using a persistent bi-directional connection (WebSockets or HTTP 1.1)
Start/queue tests requested to run by the user on available hardware nodes (I will add more linux boxes so need a package to distribute parallel tests across the "grid")
Monitor the running selenium browser tests and send client status updates (ex) running/stop)
Tests submitted by the users need to be persistent and accessible for future or continuous integration (couchdb or mysql)
Scheduling of jobs to be run on a continuous basis (ex. run every set interval of time).
Is nodejs a bit overkill? should I focus on Java only for the backside?

https://github.com/LearnBoost/soda
This is for vanilla Sauce Labs/Selenium RC integration. I'd imagine when you're running in a browser instance like Selenium RC, websockets should just work, as the javascript on the page is executed. If you're authenticating a user, you want to just fill out whatever form and submit (which triggers your WS auth) as normal.
I don't think nodejs is overkill for this. Node is lightweight. I don't know that I'd add node to my stack ONLY for this, but its certainly convenient and if you have a commitment to javascript, its no big deal.

Related

Build an extensible system for scraping websites

Currently, I have a server running. Whenever I receive a request, I want some mechanism to start the scraping process on some other resource(preferably dynamically created) as I don't want to perform scraping on my main instance. Further, I don't want the other instance to keep running and charging me when I am not scraping data.
So, preferably a system that I can request to start scraping the site and close when it finishes.
Currently, I have looked in google cloud functions but they have a cap at 9 min max for every function so it won't fit my requirement as scraping would take much more time than that. I have also looked in AWS SDK it allows us to create VMs on runtime and also close them but I can't figure out how to push my API script onto the newly created AWS instance.
Further, the system should be extensible. Like I have many different scripts that scrape different websites. So, a robust solution would be ideal.
I am open to using any technology. Any help would be greatly appreciated. Thanks
I can't figure out how to push my API script onto the newly created AWS instance.
This is achieved by using UserData:
When you launch an instance in Amazon EC2, you have the option of passing user data to the instance that can be used to perform common automated configuration tasks and even run scripts after the instance starts.
So basically, you would construct your UserData to install your scripts, all dependencies and run them. This would be executed when new instances are launched.
If you want the system to be scalable, you can lunch your instances in Auto Scaling Group and scale it up or down as you require.
The other option is running your scripts as Docker containers. For example using AWS Fargate.
By the way, AWS Lambda has limit of 15 minutes, so not much more than Google functions.

Is it possible to build a headless Node-based client from Meteor?

I'm working on a system where a remote machine (hooked up to a projector and some other hardware) is controlled via a Meteor application. Currently, we are using a home-grown DDP client written in C++ to accomplish this, but this approach is not as flexible as I would like:
There is duplication between C++ and JavaScript.
Upgrades are hard because we can't deploy both the server and the client at the same time, so we always have to think about backwards compatibility and ordering.
So I'm toying with the idea of rewriting the Meteor part of the C++ app in JavaScript. What I would like, ideally, is to have a special client of our app (call it headless, akin to to server and client) which:
is built from the same source as the rest of the Meteor app, so we can reuse the same business logic as on the server and web client,
runs in Node.js on the client machine so it can access the OS, and
doesn't contain any of the browser code, but adds some other code specific to controlling the machine and communicating with the C++ app.
Even better would be if this client would not contain any of the actual code, but just a piece of bootstrap code. The bootstrapper would download the actual application code from the server and re-download it when the server is updated, in the same way as happens for the HTML client. That would make updates much easier, because we can assume that server and client are always running the same version.
Does such a thing exist? If not, how close can I get without unreasonable effort? Searches for "meteor headless client" and "meteor node client" are not helping me, and the only somewhat related question I could find isn't well answered.
You should be able to get this to work by using the meteor-desktop package to build your remote headless client.
https://www.npmjs.com/package/meteor-desktop#architecture
In Electron app, there are two processes running along in your app.
The so-called main process and renderer process. Main process is just
a JS code executed in node, and the renderer is a Chromium process. In
this integration your Meteor app is being run in the renderer process
and your desktop specific code runs in the main process. They are
communicating through IPC events. Basically, the desktop side
publishes its API as an IPC event listeners. In your Meteor code,
calling it is as simple as Desktop.send('module', 'event');.
This will give you:
os access on this (desktop) client
hot code push (with caveats around the node modules)
provides Meteor.isDesktop to control which code runs on the browser vs the desktop client
If you wish to use the Meteor client as a headless client, and since client runs in the browser, I'd suggest your look at using a headless browser like PhantomJS, which can run your Meteor code without the UI, and has the ability to access the local file system.
Another option, which is not really what you describe but would make everything javascript, is to use the node ddp client, and write your code in modules you can easily import on the node side.
Is there a regular meteor client on the remote machine with custom hardware? Or is that the C++ program acting as a client? And then a server, in addition to your other client browser?
Sounds like you should actually do a few things differently:
Set up a dynamic DNS system with a custom domain and port forwarding so you can use the special hardware remote system as a server.
Run the Meteor server on that remote machine with hardware.
Instead of a full C++ app speaking DDP, just make a Node.js C++ addon that talks to the hardware and use that in the Meteor server code.

Heroku workers in dev

I'm looking into using a worker as well as a web for the first time as I have to scrape a website. I'm just wondering before I commit to this about working in a dev environment. How do jobs in a queue get handled when I'm testing my app before it's pushed to Heroku?
I will probably be using RabbitMQ if that's relevant here.
I guess it depends on what you mean by testing. You can unit test the code that does the scraping in isolation from any queue, and you can provide a mock implementation of the queue operations to handle a goodly portion of your integration tests.
I suppose you might want a real instance of the queue for certain tests, but depending on the nature of your project, you might be satisfied with the sorts of tests described in the first paragraph.
If you simply must test the queue operation and/or you want to run a complete copy of production locally then you'll have to stand up an instance of Rabbitmq. You can stand one up locally or use one of the SAAS providers.
If you have multiple developers working on the project, you might want to make it easy for them by creating something like a vagrant script that sets up a complete environment in a vm. Or better still something like docker. Doing so also gives you a lot more deployment options (making you less dependent on the heroku tooling).
Lastly, numerous CI solutions like Travis CI provide instances of popular services for running tests (including rabbit).

Browserstack runs does not update its capabilities

I was wondering if anyone else knows a good way to start individual browser stack tests sequentially using Capybara/Browserstack/Cucumber.
I'm having issues with using Capybara in the sense that browserstack doesn't get updated with my new capabilities for every run, even when I shut down my browser, i.e: The two test runs are started sequentually in Browserstack, but with the same browser and OS-settings.
Abstract Scenario: Run login tests
Given that I want to test x website with capabilities og
Examples:
|browser|browser_version| os |os_version|resolution|
|IE| 11.0 | Windows |8.1 |1024x768 |
|Firefox| 45.0 | Windows |10 |1024x768 |
I've checked that every value successfully gets sent through to the next step, but it seems like Browserstack doesn't update its new capabilities that I'm trying to set.
I know I can probably manage to do parallell runs setting capabilities through settings instead, but we have a limit to how many parallell runs using Browserstack's license. That's why I want to run them sequantually and figured this could be a way to do it.
As per my experience, BrowserStack initiates a test on a particular OS/browser capability that it receives from your tests. Thus, it seems your setup is sending the same capability for both the runs of the test.
I believe you want to run tests sequentially and on different OS/browser combinations. In that case you can refer to the BrowserStack's documentation for configuring Parallel Cucumber tests using Rake file in the "Parallel tests" section. After creating all the files, you can run the following command to run tests sequentially:
rake BS_USERNAME=<username> BS_AUTHKEY=<access_key> nodes=1

Node server GUI frontend

Well, we all know about headless servers. Actually, probably the vast majority of servers out there are headless.
As usual (it seems), my situation asked for quite something else. Basically, the proposed architecture looks more or less like:
The app server (node.js) is situated on a physical machine physically connected to two screens.
Between this machine and the 'net there are all sorts of regular networking layers. Please keep in mind that one of the main reasons for this setup is physical portability: ie, the client gets the necessary hardware as the product. The server itself relies on CDN for static files etc.
Each monitor/screen needs to show something different, produced by the same node server.
For now this server will probably run on Windows, but given a concept (which is what my question is after), I can change the code to run on the target platform. Well, depending on my code, this could even be done automatically.
So, my actual question. Node is quite flexible in that it can be run by anything - even custom made software (C++, Delphi, even GM). Just shell_exec('node server.js') and we're off.
But the screens themselves need to be quite dynamic. So node needs to influence both screens in some way. A few options I'm considering:
A custom app which creates two resizable, featureless windows with an embedded chromium browser to be controlled by the node server somehow (how node react with these browsers?)
A custom app which, according to node CLI output, updates the two screens' UI. Since I need something flashy as the UI, this app would be created in something like GameMaker, or a similar engine.
PS: Just in case you're asking; the physical connection opposed to a network one (eg; web-based GUI frontend) is by design.
I'd just wire up the result/monitoring screens as regular HTML pages. In your Node app, create a second HTTP server (on a non-standard port, firewalled from the public) that serves up the monitoring page.
Use socket.io to to send the realtime data to the monitoring page, which can make everything look pretty. Fire it up in a full-screen instance of Chrome.
This approach completely frees you from any kind of platform dependency, and decouples the monitoring app from the server app. It leaves you the latitude to run the monitoring app on a separate box if necessary.

Resources