I am considering using nodejs to make a non-realtime app. For example, a website like a blog, forum, or image boards.
I have read that nodejs is good when used for asynchronous jobs. So I am wondering what the result would be when used to serve a lot of static files, like big images, css & js files, etc.
Is it true that when sending a file (suppose it's 2-3MB), the whole server will be blocked until the transfer is complete? I have also read that it might be possible to use the OS's sendfile() syscall to do this job. In this case, does Express support this?
No it is not true. You can easily send files that are large (much larger than 2-3 MB) without blocking. People who complain about things like this blocking the Node event loop just don't know what they're doing.
You don't necessarily need to use Express to get this behavior.
That being said, if what you want is a file server, there's no reason to use NodeJS. Just point Apache at a directory, and let it fly. Why reinvent the wheel just to use the new sexy technology, when old faithful does just fine?
If you would like to use node as a simple http server, may I recommend the very simple command line module.
https://npmjs.org/package/http-server
I haven't looked at the code of the module, but it is likely not optimized for large files. Let's define large in this case, as files that are not easily cached in memory(whatever this means for your setup). If your use case calls for more optimization (piping "large" files for example) you may still have to write your own module, but this will get you started very quickly, and is an excellent utility to use for general development when you need to serve up a directory real quick.
Related
Does G-WAN spin up a new NodeJS instance for every user request? (i.e. if you're using JavaScript for a servlet) For instance, if 100 users request an action at the same time that's handled by a specific script.
My primary question goes with scripting G-WAN with non-C/C++ languages... can sendfile be used from a JavaScript servlet? I want to stream large files to clients which won't be in the www folder, but rather from a specified file path on the server. Is this possible? If not, can NodeJS's streaming be used in G-WAN?
Does G-WAN spin up a new NodeJS instance for every user request?
Unlike other languages (C/C++, Objective-C/C++, C#, PH7, Java, and Scala), Javascript is not loaded as a module and rather executed as a CGI process, just like Zend PHP, or Perl.
So, yes, Node.JS will scale poorly unless you use caching (either G-WAN's or yours).
can sendfile be used from a JavaScript servlet?
Yes but G-WAN having its own asynchronous machinery it's certainly more efficient to do it "the G-WAN way" (as suggested by Ken).
If you insist for using sendfile() from Javascript then keep in mind that you will have to use it in non-blocking mode and manage the asynchronous events yourself (synchronous calls are BLOCKING the current G-WAN worker thread).
Can I stream files to clients which won't be in the www folder?
Yes, you can just use a system symlink to map a foreign folder to a /www resource - or you can stream contents from within a G-WAN handler or a servlet.
You can stream content from G-WAN; you can stream content from Node.JS. Choosing one or the other depends on what other requirements you have since either can support streaming content for the kind of loads you mention (assuming reasonable system resources). I have a small Node.JS server doing some URL rewrites and reverse-proxy to serve content we license from a 3rd party. It is entirely separate from the G-WAN server, with HAProxy routing requests to either as appropriate. From what I've just learned about JavaScript under G-WAN, I wouldn't want to go that route. From what you are describing, I would stick to a pure G-WAN approach using C (or possibly C++ or one of the others that G-WAN can load as dynamic modules) for writing servlets and handlers.
From personal experience, I recommend C for simplicity, performance and compactness. C++ is also a good choice. G-WAN Servlets and Handlers are often quite small snippets of code - especially compared to writing a complete application - so you may be able to make use of C or C++ here even if you are not expert in those languages.
Take a look at the 10-lines-of-C-code implementation of an FLV streamer near the bottom of the G-WAN User's Manual. Other relevant examples are stream1.c, stream2.c and stream3.c.
To get started, I recommend downloading and installing G-WAN following the 10-second G-WAN installation process, and then tweaking the servlet sample code to serve some content you have (i.e., change the paths and filenames as needed).
Good luck!
Ken
There is also other option for using JS by directly embedding VM (Spidermonkey) in servlet.
Evaluating Nodejs and trying to see if it will fit our needs. Coming from a rails world, I have a few unanswered questions despite searching for a long time.
Whats the best way to manage assets with Nodejs (Expressjs)? In rails, the static assets are a) fingerprinted for caching forever b) js and css are minified 3) scss is compiled down to css
Whats the best way to handled uploaded images from users such as avatars?
Does grunt help with minifying and gzipping html/css/javascript?
How can I avoid mutliple http requests to the server with Node. I don't want to make multiple http requests for every javascript asset I need. Rails helps here by combining all js and css files.
Also, is Mongodb the preferred solution for most projects? I hear a lot of bad things about Mongodb and good things about it. Having a difficult time deciding if Mongo can help with more reads than writes with probably 400-500 GB data in the long run.
Any help or pointers? Thanks so much for taking the time to point me to the right places.
For each of the point you mentioned I give you a few module examples that might fit your need. Remember that at every point there are much more modules serving the same purpose:
node-static (as a static file server), node-uglify (for minifying JS code), css-clean (same for CSS), merge-js, sqwish, and jake can help you with the building of the website (in this step you could plug-in the previous modules)
node-formidable is pretty famous
check out this question
checkout this question
I am not sure if it is the "preferred". It is the noSQL and the Javascript nature in it that makes it attractive. There are modules already for every type of database. ongo should handle that data. Depends also how large is one document. There are some limitations.
There is this Github Wiki page in the NodeJS project listing and categorizing many of the important modules out there.
The exact choice of modules also depends what framework will you use use to build the app. A pretty established (but certainly not the only one) is express. But you can find more on this topic here.
I want to make a program (more precisely, a service) that periodically scans directories to find some video files (.avi, .mkv, etc) and automatically download some associated files (mostly subtitles) from one or several websites.
This program could run on linux or windows as well.
On one hand, I know well Qt from a long time and I know all its benefits, but on the other hand, I'm attracted by node.js and it extreme flexibility and liveliness.
I need to offer some interactivity with the end user of my program (for instance, chose the scans directories, etc).
What would be the best choice in your opinion in 2013?
I advise against Node.js for "small tools and programs". Especially for iterative tasks.
The long story
The reason is quite simply the way Node.js works. Its asynchronous model makes simple tasks unnecessarily convoluted. Additionally, because many callbacks are called from the Node.js event loop, you can't just use try/catch structures so every tiny error will crash your whole Application.
Of course there are ways to catch those errors or work with them, but the docs advise you against all of them and advise you to restart the application gracefully in any case to prevent memory leaks. This means you have to implement yet another piece of code.
The only real solution in Node.js would be writing your Application as a Cluster, which is a great concept but of course would require you to use some kind of IPC to get your data back to a process that can handle it.
Also, since you wrote about "periodically scan"ning a directory, I want to point out that you should...
Use file system watchers for services
Almost every language kit has those now and I strongly suggest using those and only use a fallback full-scan.
In Qt there is a system-independent class QFileSystemWatcher that provides a handy callback whenever specified files are changed
In Java there is the java.nio.file.FileSystem.getWatchService()
Node.js has the fs.watch function, if you really want to go for it
Is there a way to precompile node.js scripts and distribute the binary files instead of source files?
Node already does this.
By "this" I mean creating machine-executable binary code. It does this using the JIT pattern though. More on that after I cover what others Googling for this may be searching for...
OS-native binary executable...
If by binary file instead of source, you mean a native-OS executable, yes. NW.JS and Electron both do a stellar job.
Use binaries in your node.js scripts...
If by binary file instead of source, you mean the ability to compile part of your script into binary, so it's difficult or impossible to utilize, or you want something with machine-native speed, yes.
They are called C/C++ Addons. You can distribute a binary (for your particular OS) and call it just like you would with any other var n = require("blah");
Node uses binaries "Just In Time"
Out of the box, Node pre-compiles your scripts on it's own and creates cached V8 machine code (think "executable" - it uses real machine code native to the CPU Node is running on) it then executes with each event it processes.
Here is a Google reference explaining that the V8 engine actually compiles to real machine code, and not a VM.
Google V8 JavaScript Engine Design
This compiling takes place when your application is first loaded.
It caches these bits of code as "modules" as soon as you invoke a "require('module')" instruction.
It does not wait for your entire application to be processed, but pre-compiles each module as each "require" is encountered.
Everything inside the require is compiled and introduced into memory, including it's variables and active state. Again, contrary to many popular blog articles, this is executed as individual machine-code processes. There is no VM, and nothing is interpreted. The JavaScript source is essentially compiled into an executable in memory.
This is why each module can just reference the same require and not create a bunch of overhead; it's just referencing a pre-compiled and existing object in memory, not "re-requiring" the entire module.
You can force it to recompile any module at any time. It's lesser-known that you actually have control of re-compiling these objects very easily, enabling you to "hot-reload" pieces of your application without reloading the entire thing.
A great use-case for this is creating self-modifying code, i.e. a strategy pattern that loads strategies from folders, for example, and as soon as a new folder is added, your own code can re-compile the folders into an in-line strategy pattern, create a "strategyRouter.js" file, and then invalidate the Node cache for your router which forces Node to recompile only that module, which is then utilized on future client requests.
The end result: Node can hot-reload routes or strategies as soon as you drop a new file or folder into your application. No need to restart your app, no need to separate stateless and stateful operations: Just write responses as regular Node modules and have them recompile when they change.
Note: Before people tell me self-modifying code is as bad or worse than eval, terrible for debugging and impossible to maintain, please note that Node itself does this, and so do many popular Node frameworks. I am not explaining original research, I am explaining the abilities of Google's V8 Engine (and hence Node) by design, as this question asks us to do. Please don't shoot people who R the FM, or people will stop R'ing it and keep to themselves.
"Unix was not designed to stop its users from doing stupid things, as
that would also stop them from doing clever things." – Doug Gwyn
Angular 2, Meteor, the new opensource Node-based Light table IDE and a bunch of other frameworks are headed in this direction in order to further remove the developer from the code and bring them closer to the application.
How do I recompile (hot-reload) a required Node module?
It's actually really easy... Here is a hot-reloading npm, for alternatives just Google "node require hot-reload"
https://www.npmjs.com/package/hot-reload
What if I want to build my own framework and hot-reload in an amazing new way?
That, like many things in Node, is surprisingly easy too. Node is like jQuery for servers! ;D
stackoverflow - invalidating Node's require cache
I'm fairly new and just getting to know node.js (background as PHP developer). I've seen some nodeJs examples and the video on nodejs website.
Currently I'm running a video site and in the background a lot of tasks have to be executed. Currently this is done by cronjobs that call php scripts. The downsite of this approach is when an other process is started when the previous is still working you get a high load on the servers etc.
The jobs that needs to be done on the server are the following:
Scrape feeds from websites and insert them in mysql database
Fetch data from websites (scraping) (upon request)
Generate data for reporting. These are mostly mysql queries that need to be executed.
Tasks that need to be done in the future
Log video views (when a user visits a video page) (this will also be logged to mysql)
Log visitors in general
Show ads based on searched video
I want to be able to call an url so that a job can be queued and also be able to schedule jobs by time or they can run constantly.
I don't know if node.js is the path to follow that's why I'm asking it here. What are the advantages of doing this in node? The downsites?
What are the pro's here with node.js?
Thanks for the response!
While traditionally used for web/network tasks (web servers, IRC chat servers, etc.), Node.js shines when you give it any kind of IO bound (as opposed to CPU bound) task, since it uses completely asynchronous IO (that is, all IO happens outside of the main event loop). For example, Node can easily hold open many sockets, waiting for data on each, or stream data to and from files very efficiently.
It really sounds like you're just looking for a job queue; a popular one is Resque, and though it's written for Ruby, there are versions for PHP, Node.js, and more. There are also job queues built specifically for PHP; if you want to stick to PHP, a Google search for "PHP job queue" make take you far.
Now, one advantage to using Node.js is, again, its ability to handle a lot of IO very easily. Of course I'm just guessing, but based on your requirements, it could be a good tool for the job:
Scrape data/feeds from websites - mostly waiting on network IO
Insert data into MySQL - mostly waiting on network IO
Reporting - again, Node is good at MySQL queries, but probably not so good at analyzing data
Call a URL to schedule a job - Node's built-in HTTP handling and excellent web libraries make this a cinch
So it's entirely possible you may want to experiment with Node for these tasks. If you do, take a look at Resque for Node or another job system like Kue. It's also not very hard to build your own, if you don't need something complicated--Redis is a good tool for this.
There are a few reasons you might not want to use Node. If you're not familiar with JavaScript and evented and continuation-passing style programming, Node.js may have a bit of a learning curve, as you have to force yourself to stop thinking synchronously. Furthermore, if you do have a lot of heavy non-IO tasks in your program, such as analyzing data, Node will not excel as those calculations will block the main event loop and keep Node from handling callbacks, etc. for your asynchronous IO. Finally, if you have a lot of logic already in PHP or another language, it may be easier and/or quicker to find a solution in your language of choice.
I second the above answers. You don't necessarily need a full-service job queue, however: you can use flow-control modules like async to run tasks in parallel or series, as fast as they'll go or with controlled concurrency.
Node.js has many powerful scraping/parsing tools. This post mentions a few; I just heard about Trumpet recently; there are probably dozens of options. Node.js has a Stream module in core and Request makes HTTP interactions extremely easy.
For timed tasks, the simplest approach is a basic setTimeout/setInterval. Or you could write the scraper as a script that's called on cron. Or have it triggered on some event using the EventEmitter module in core. etc...
Uncontrolled amount of node js parallel jobs may lay down your server. You will need to control processes or in better way put them in queue for each task
For this needs and if you know php I suggest to use gearman and add jobs by needs or by small php scripts