G-WAN, NodeJS, and Streaming - node.js

Does G-WAN spin up a new NodeJS instance for every user request? (i.e. if you're using JavaScript for a servlet) For instance, if 100 users request an action at the same time that's handled by a specific script.
My primary question goes with scripting G-WAN with non-C/C++ languages... can sendfile be used from a JavaScript servlet? I want to stream large files to clients which won't be in the www folder, but rather from a specified file path on the server. Is this possible? If not, can NodeJS's streaming be used in G-WAN?

Does G-WAN spin up a new NodeJS instance for every user request?
Unlike other languages (C/C++, Objective-C/C++, C#, PH7, Java, and Scala), Javascript is not loaded as a module and rather executed as a CGI process, just like Zend PHP, or Perl.
So, yes, Node.JS will scale poorly unless you use caching (either G-WAN's or yours).
can sendfile be used from a JavaScript servlet?
Yes but G-WAN having its own asynchronous machinery it's certainly more efficient to do it "the G-WAN way" (as suggested by Ken).
If you insist for using sendfile() from Javascript then keep in mind that you will have to use it in non-blocking mode and manage the asynchronous events yourself (synchronous calls are BLOCKING the current G-WAN worker thread).
Can I stream files to clients which won't be in the www folder?
Yes, you can just use a system symlink to map a foreign folder to a /www resource - or you can stream contents from within a G-WAN handler or a servlet.

You can stream content from G-WAN; you can stream content from Node.JS. Choosing one or the other depends on what other requirements you have since either can support streaming content for the kind of loads you mention (assuming reasonable system resources). I have a small Node.JS server doing some URL rewrites and reverse-proxy to serve content we license from a 3rd party. It is entirely separate from the G-WAN server, with HAProxy routing requests to either as appropriate. From what I've just learned about JavaScript under G-WAN, I wouldn't want to go that route. From what you are describing, I would stick to a pure G-WAN approach using C (or possibly C++ or one of the others that G-WAN can load as dynamic modules) for writing servlets and handlers.
From personal experience, I recommend C for simplicity, performance and compactness. C++ is also a good choice. G-WAN Servlets and Handlers are often quite small snippets of code - especially compared to writing a complete application - so you may be able to make use of C or C++ here even if you are not expert in those languages.
Take a look at the 10-lines-of-C-code implementation of an FLV streamer near the bottom of the G-WAN User's Manual. Other relevant examples are stream1.c, stream2.c and stream3.c.
To get started, I recommend downloading and installing G-WAN following the 10-second G-WAN installation process, and then tweaking the servlet sample code to serve some content you have (i.e., change the paths and filenames as needed).
Good luck!
Ken

There is also other option for using JS by directly embedding VM (Spidermonkey) in servlet.

Related

Making website with Node.js without framework

i want to create a website based on node js and mysql , but i've read that there is a framework called express for node js , and i'm wondering if i must to use such kind of a framework to create a decent website or it is possible without it and just work with pure node js.
No framework is required. You can write a full-blown web server using only the http module or if you really want to write everything yourself, you can even do it with only the net module.
It's really about what is the most effective use of your time and skill as a developer. Except for academic or pure learning experience reasons, if you're just trying to accomplish a task as efficiently as possible and free, pre-existing, pre-tested code exists that makes your job easier, then that's a better way to go.
For example, if I need to do a file upload from a browser to my back-end and the data is coming in as the multipart/formdata content-type from the browser, I have zero interest in reading and learning the multipart/formdata RFC then writing my own code to parse the multipart/formdata content-type. Pre-existing, already tested code exists to do that for me and I'm adding no value to the goals of my project by re-implementing and then testing it all myself. Therefore, I'd like to use a pre-built module that does all that for me. I can just configure the right library on the right route and out plops my uploaded file in only the amount of time it takes to understand the interface to the 3rd party module and how to use it properly.
This is where Express comes in. Not only does it offer a useful set of features and architecture for configuring routes, installing middleware, sending responses, using template engines, handling errors, etc... but there are also thousands of third party modules that are built to hook into Express and it is easiest to use them if you're using Express as your core framework. Some of these modules could be used outside of Express, some cannot - it really depends upon how they're designed and what Express interfaces they do or don't use.
Also, Express is fairly "un-opinionated" and fairly "lightweight" which means it doesn't force you into a particular methodology. It just offers you easier ways to do things you were already going to have to write code for yourself.
Look at it this way. When you get node.js, there are thousands of APIs that offer lots of already tested things such as a TCP library, a file I/O library, etc... Those are frameworks (in a sense) too. You don't have to use them either. You could rewrite whatever functionality you need from scratch. But, you wouldn't even think about doing that because tested code already exists that solves your problem. So, you happily build on top of things that are already done.
One of the BIG advantages of coding with node.js is getting access to the tens of thousands of pre-built modules on NPM that already solve problems that many people have. Coding in node.js with a mindset that you will never use any outside modules from NPM is throwing away one of the biggest advantages of coding with node.js.
could you tell me what are the Routes used for in frameworks?
A route is a URL that you wish for your web server to respond to. So, if you want http://myserver.com/categories to be URL that your server responds to, then you create a route for /categories so that you can write code for what should happen when that URL is requested. A framework like Express allows you to create that route very simply with just a single statement such as:
app.get('/categories', function(req, res) {
// put code here to handle that request
});
This is just the tip of the iceberg for what Express supports. It allows you to use wildcards in route definitions, identify parameters in urls, create middleware that does prep work on lots of routes (such as check if the user is logged in), etc...
You don't have to use a framework but it is recommended to use one of them since frameworks like Express make your life easier in many ways. Check this: What is Express.js?
Yes you CAN write a Node.js-based backend without any back end implementation framework such as Express. And if you are using Node.js for the first time without any previous experience of asynchronous coding, I'd advise against using Express, KOA or other Node implementation frameworks for your simple learner apps (e.g. those needing things like register/login form processing, logout button, user preference updates to database, etc) because:
(1) Node.js is a core skill for JavaScript back ends.
Stupid analogies between server tasking and restaurant waiters are no use to a real web engineer. You must first know what exactly Node can/cannot do in the server CPU that makes it different to most other back end technologies. Then you have to see how the Node process actually does this. Using Express/KOA/Hapi/etc you are sometimes effectively removing the mental challenges that come with a Node back end. Any time-saving is achieved at the expense of gaining a proper working understanding of what Node is and how it really operates.
(2) Learning Node.js and its asynchronous coding is hard enough without the added complication of coding with an unknown framework like Express/KOA that assumes users' familiarity with JavaScript constructs like callback functions and Promises. It's always better to learn something in isolation so you get the essence of its individual effects, rather than the overall effects if used with other packages/frameworks. So many of these Node.js Express tutorials are the software equivalent of learning to make a cake by watching Momma do it. We can copy it but we don't know how or why it's working. Professional coders can't just be good copycats.
(3) Available learning tutorials using Express often drag in other technologies like MongoDB, Mongoose, Mustache, Handlebars, etc that make learning Node.js even more awkward still.
(4) A share of basic web apps can be written more efficiently with Node.js, custom JS and existing JS modules imported off the npm repository rather than with Express.
(5) Once asynchronous coding and the JavaScript constructs available to assist with it are understood clearly, pure Node.js apps for basic tasks aren't that hard.
(5) After you do get your head around Node.js and can get basic web app functionalities working using server-side JavaScript constructs, you can then judiciously start to explore Express/Hapi/KOA/etc and see what an implementation framework can do for your workflow when doing larger projects needing numerous functionalities. At this point you know what Express code should be doing and why it is done the way it is.
Node.js has become the back-end technology of choice for most small to medium scale web applications over the last 10 years. It is also the major reason why the JavaScript language has evolved from a mere front-end scripting tool with a limited set of Java-aping constructs to the innovative and comprehensive language that it is today. It is also the most popular language in use today. Investing time in understanding the Node server framework, and the latest JavaScript constructs used in Node, is time well spent. Implementation frameworks such as Express, KOA, Hapi, Sails, etc have great benefit when writing more elaborate back ends on the Node.js platform. But all these implementation frameworks are predicated on the behaviour patterns of Node.js. So unless Node itself is understood first, the full utility of Express/KOA/Sails/etc will never be enjoyed.
Try here for the pure Node.js.

Is Node.js useful for "classic" style websites?

I am considering using nodejs to make a non-realtime app. For example, a website like a blog, forum, or image boards.
I have read that nodejs is good when used for asynchronous jobs. So I am wondering what the result would be when used to serve a lot of static files, like big images, css & js files, etc.
Is it true that when sending a file (suppose it's 2-3MB), the whole server will be blocked until the transfer is complete? I have also read that it might be possible to use the OS's sendfile() syscall to do this job. In this case, does Express support this?
No it is not true. You can easily send files that are large (much larger than 2-3 MB) without blocking. People who complain about things like this blocking the Node event loop just don't know what they're doing.
You don't necessarily need to use Express to get this behavior.
That being said, if what you want is a file server, there's no reason to use NodeJS. Just point Apache at a directory, and let it fly. Why reinvent the wheel just to use the new sexy technology, when old faithful does just fine?
If you would like to use node as a simple http server, may I recommend the very simple command line module.
https://npmjs.org/package/http-server
I haven't looked at the code of the module, but it is likely not optimized for large files. Let's define large in this case, as files that are not easily cached in memory(whatever this means for your setup). If your use case calls for more optimization (piping "large" files for example) you may still have to write your own module, but this will get you started very quickly, and is an excellent utility to use for general development when you need to serve up a directory real quick.

nodejs job server (multiple purpose)

I'm fairly new and just getting to know node.js (background as PHP developer). I've seen some nodeJs examples and the video on nodejs website.
Currently I'm running a video site and in the background a lot of tasks have to be executed. Currently this is done by cronjobs that call php scripts. The downsite of this approach is when an other process is started when the previous is still working you get a high load on the servers etc.
The jobs that needs to be done on the server are the following:
Scrape feeds from websites and insert them in mysql database
Fetch data from websites (scraping) (upon request)
Generate data for reporting. These are mostly mysql queries that need to be executed.
Tasks that need to be done in the future
Log video views (when a user visits a video page) (this will also be logged to mysql)
Log visitors in general
Show ads based on searched video
I want to be able to call an url so that a job can be queued and also be able to schedule jobs by time or they can run constantly.
I don't know if node.js is the path to follow that's why I'm asking it here. What are the advantages of doing this in node? The downsites?
What are the pro's here with node.js?
Thanks for the response!
While traditionally used for web/network tasks (web servers, IRC chat servers, etc.), Node.js shines when you give it any kind of IO bound (as opposed to CPU bound) task, since it uses completely asynchronous IO (that is, all IO happens outside of the main event loop). For example, Node can easily hold open many sockets, waiting for data on each, or stream data to and from files very efficiently.
It really sounds like you're just looking for a job queue; a popular one is Resque, and though it's written for Ruby, there are versions for PHP, Node.js, and more. There are also job queues built specifically for PHP; if you want to stick to PHP, a Google search for "PHP job queue" make take you far.
Now, one advantage to using Node.js is, again, its ability to handle a lot of IO very easily. Of course I'm just guessing, but based on your requirements, it could be a good tool for the job:
Scrape data/feeds from websites - mostly waiting on network IO
Insert data into MySQL - mostly waiting on network IO
Reporting - again, Node is good at MySQL queries, but probably not so good at analyzing data
Call a URL to schedule a job - Node's built-in HTTP handling and excellent web libraries make this a cinch
So it's entirely possible you may want to experiment with Node for these tasks. If you do, take a look at Resque for Node or another job system like Kue. It's also not very hard to build your own, if you don't need something complicated--Redis is a good tool for this.
There are a few reasons you might not want to use Node. If you're not familiar with JavaScript and evented and continuation-passing style programming, Node.js may have a bit of a learning curve, as you have to force yourself to stop thinking synchronously. Furthermore, if you do have a lot of heavy non-IO tasks in your program, such as analyzing data, Node will not excel as those calculations will block the main event loop and keep Node from handling callbacks, etc. for your asynchronous IO. Finally, if you have a lot of logic already in PHP or another language, it may be easier and/or quicker to find a solution in your language of choice.
I second the above answers. You don't necessarily need a full-service job queue, however: you can use flow-control modules like async to run tasks in parallel or series, as fast as they'll go or with controlled concurrency.
Node.js has many powerful scraping/parsing tools. This post mentions a few; I just heard about Trumpet recently; there are probably dozens of options. Node.js has a Stream module in core and Request makes HTTP interactions extremely easy.
For timed tasks, the simplest approach is a basic setTimeout/setInterval. Or you could write the scraper as a script that's called on cron. Or have it triggered on some event using the EventEmitter module in core. etc...
Uncontrolled amount of node js parallel jobs may lay down your server. You will need to control processes or in better way put them in queue for each task
For this needs and if you know php I suggest to use gearman and add jobs by needs or by small php scripts

Node.js event vs thread programming on server side

We are planning to start a fairly complex web-portal which is expected to attract good local traffic and I've been told by my boss to consider/analyse node.js for the serve side.
I think scalability and multi-core support can be handled with an Nginx or Cherokee in front.
1) Is this node.js ready for some serious/big business?
2) Does this 'event/asynchronous' paradigm on server side has the potential to support the heavy traffic and data operation ? considering the fact that 'everything' is being processed in a single thread and all the live connections would be lost if it got crashed (though its easy to restart).
3) What are the advantages of event based programming compared to thread based style ? or vice-versa.
(I know of higher cost associated with thread switching but hardware can be squeezed with event model.)
Following are interesting but contradicting (to some extent) papers:-
1) http://www.usenix.org/events/hotos03/tech/full_papers/vonbehren/vonbehren_html
2) http://pdos.csail.mit.edu/~rtm/papers/dabek:event.pdf
Node.js is developing extremely rapidly, and most of its functionality is sturdy and ready for business. However, there are a lot of places where its lacking, like database drivers, jquery and DOM, multiple http headers, etc. There are plenty of modules coming up tackling every aspect, but for a production environment you'll have to be careful to pick ones that are stable.
Its actually much MUCH more efficient using a single thread than a thousand (or even fifty) from an operating system perspective, and benchmarks I've read (sorry, don't have them on hand -- will try to find them and link them later) show that it's able to support heavy traffic -- not sure about file-system access though.
Event based programming is:
Cleaner-looking code than threaded code (in JavaScript, that is)
The JavaScript engine is extremely efficient with processing events and handling callbacks, and its easily one of the languages seeing the most runtime optimization right now.
Harder to fit when you are thinking in terms of control flow. With events, you can never be sure of the flow. However, you can also come to think of it as more dynamic programming. You can treat each event being fired as independent.
It forces you to be more security-conscious when programming, for the above reason. In that sense, its better than linear systems, where sometimes you take sanitized input for granted.
As for the two papers, both are relatively old. The first benchmarks against this, which as you can see, has a more recent note about these studies:
http://www.eecs.harvard.edu/~mdw/proj/seda/
It also cites the second paper you linked about what they have done, but refuses to comment on its relevance to the comparison between event-based systems and thread-based ones :)
Try yourself to discover the truth
See What is Node.js? where we cover exactly that:
Node in production is definitely possible, but far from the "turn-key" deployment seemingly promised by the docs. With Node v0.6.x, "cluster" has been integrated into the platform, providing one of the essential building blocks, but my "production.js" script is still ~150 lines of logic to handle stuff like creating the log directory, recycling dead workers, etc. For a "serious" production service, you also need to be prepared to throttle incoming connections and do all the stuff that Apache does for PHP. To be fair, Rails has this exact problem. It is solved via two complementary mechanisms: 1) Putting Rails/Node behind a dedicated webserver (written in C and tested to hell and back) like Nginx (or Apache / Lighttd). The webserver can efficiently serve static content, access logging, rewrite URLs, terminate SSL, enforce access rules, and manage multiple sub-services. For requests that hit the actual node service, the webserver proxies the request through. 2) Using a framework like "Unicorn" that will manage the worker processes, recycle them periodically, etc. I've yet to find a Node serving framework that seems fully baked; it may exist, but I haven't found it yet and still use ~150 lines in my hand-rolled "production.js".

What methods can we use to interoperate programming languages?

What can we do to integrate code written in a language with code written in any other language? Which techniques are more/less known? I know that some/most languages can be compiled to Java bytecode, but what do we do about the rest ?
You mention the "compile to Java" approach, and there's also the "use a .NET language" approach, so let's look at other cases. There are a number of ways you can interoperate, and it depends on what you're trying to accomplish, it's a case by case situation. Things that come to mind are
Web Services (SOAP or REST)
A text (or other) file in the file system
Use of a database to relay state or other data
A messaging environment like MSMQ or MQSeries
TCP sockets or UDP messages
Mailslots and named pipes
It depends on the level of integration you want.
Do you need the code to share data? Use a platform-neutral data format, such as JSON, XML, Protocol Buffers, Thrift etc.
Do you need to be able to ask code written in one language to perform some task for code in the other? Use a web service or similar inter-process communication layer.
Do you need to be able to call the code within a single process? The answer at that point will entirely depend on which languages you're talking about.
Direct invocations:
Direct calls (if the compilers understand each other's call stack)
Remote Procedure Call (early 90's)
CORBA (late 90's)
Remote Method Invocation (Java, with RMI stack/library in target environment)
.Net Remoting
Less tightly integrated:
Web services/SOAP
REST
The two I see most often are SWIG and Thrift. The main difference is (IIRC) Thrift opens up a port and puts a server there to marshal the data between the different languages, whereas SWIG builds library interface files and uses those to call the specified methods.
I think there are a few possible relationships among programs in different langauges...
There's shares a runtime (e.g. C# and Visual Basic) and compiled into same application/process...
There's one invokes the other (e.g. perl script that invokes a C program)...
There's talks to each other via IPC on the box, or over the network (e.g. pipes and web services)...
Unfortunately your question is rather vague.
There are ways to use different languages in the same process usually by embedding a VM or an interpreter into the executable. If you need to communicate over process boundaries there again are several possibilities many of them have been already mentioned by other answers.
I would suggest you refine your question to get more helpful answers.
On the Web, cookies can be set to pass variables between ASP/PHP/JavaScript. On a previous project I worked on, we used this to create a PHP file for downloading PDFs without revealing their location on the file system from an ASP application.
Almost every language that pretends some kind of system's development use is capable of linking against external routines with either a standard OS interface, or a C function interface. That is what I tend to use.

Resources