How to prevent malicious *.js scripts from executing in Node.js - node.js

I'm using Node.js to create the web service. In the implementation, I consumed many third party modules which are installed via npm. There is security issue if there is malicious *.js scripts in the consumed modules. For example, the malicious code may delete all my disk files, or collect the secret data in silence.
I have a couple of questions regarding this.
How to detect if there is security issue in the module?
What should I do to prevent malicious *.js scripts from executing in Node.js?
I'm very appreciate if you can share any experience to build the node.js service.
Thanks,
Jeffrey

One concern you did not raise is that a module might try to make a direct connection to your database itself, or to other services on your internal network. This might be prevented by setting passwords which the module cannot find so easily.
1. Restricting disk access
This project was presented at NodeConf last year. It attempts to restrict filesystem access in precisely the situation you describe.
https://github.com/yahoo/fs-lock
"The goal for this module is to help when you are loading 3rd party modules and you need to restrict their access."
It sounds rather like the proposal Jeffrey made in the comments in Plato's answer.
(If you want to look further into hooking OS calls, this hookit project may present a few ideas. Although in its current form it only wraps the callback function, it might provide inspiration of what to hook, and how. Here is an example of it being used.)
2. Analyse flow of sensitive data
If you are only worried about data-stealing (not filesystem or database access), then you can focus your concerns:
You should be most concerned about those packages which are being passed sensitive data. Presumably some of the data on your web-service is presented to the public anyway!
Most packages will not have access to the full stack of your application, only the bits of data you pass them. If a package is only being passed a small amount of sensitive data, and never passed the rest of the data, it may not be able to do anything malicious with the data it receives. (For example, if you pass all your usernames to one package for processing and all your addresses to a different package, that is a much smaller concern than if you pass all your usernames, addresses and credit-card numbers to the same package!)
Identify the sensitive data in your app, and note which functions in which modules they are passed to.
3. Perform efficient code review
You may not need to go to Github to read the code. The great majority of packages provide all their source-code in their install folder inside node_modules. (There are a few packages which provide binaries however; these are naturally harder to verify.)
If you do want to check the code yourself, there may ways to reduce the amount of work involved:
To secure your own app, you do not need to read the entire source code of all packages in your project. You only need to review those functions which are actually called.
You may trace the code by reading it, or with the aid of a text-based debugger, or a GUI debugger. (Of course you should look out for branching, where different inputs may cause different parts of the module to be called.)
Set breakpoints when you call into a module which you don't trust, so you can step through the code that is called and see what it does. You may be able to conclude that only a small part of the module is used, so only that code needs to be verified.
Whilst tracing flow should cover concerns about sensitive data at runtime, to check for file access or database access, we should also look at the initialisation code of each module which is required, and all calls (including requires) which are made from there.
4. Other measures
It might be wise to lock the version number of each package in package.json so that you don't accidentally install a new version of a package until you decide that you need to.
You may use social factors to build confidence in a package. Check the respectability of the author. Who is he, and who does he work for? Do the author and his employers have a reputation to uphold? Similarly, who uses his project? If the package is very popular, and used by industry giants, it is likely that others have already reviewed the code.
You may wish to visit github and enable notifications for all the top-level modules you are using, by "watching" the repository. This will inform you if any vulnerabilities are reported in the package in future.

Most (all?) modules have source code available on Github, you can read through the source and look for security problems, or hire a security professional to do the job.
I just take the risk - although I tend to use popular packages with hundreds of commits, active maintenence, and issue lists.

If your project dependency tree is large enough, reviewing all of your dependencies is not a feasible long-term strategy.
The original answer from Joey has some good countermeasures you can use for specific scenarios. I've also seen https://github.com/berstend/node-safe - could make you slightly safer on mac.
A general solution to the problem is taking shape though.
How to protect a project from malicious packages
make sure you don't run lifecycle (postinstall) scripts unless they're known and necessary (see my talk on this topic)
put 3rdparty code in a compartment, lock down the environment, decide on which powerful APIs to pass to each package.
The second step requires the use of Compartment, which is a work-in-progress in TC39 https://github.com/tc39/proposal-compartments/
But a shim exists. And Some tooling was built on top of that shim.
You could use the SES-shim directly and implement your own controls, or use the convenience of LavaMoat
LavaMoat lets you generate and tweak a per-package policy where you can decide which globals and builtins it should have access to.
LavaMoat also offers a tool to manage install scripts.
Here's my talk on SES and LavaMoat with a demo at the end.
How to set up LavaMoat
See LavaMoat docs for more details
disable/allow dependency lifecycle scripts (eg. "postinstall") via #lavamoat/allow-scripts
npm i --ignore-scripts -D #lavamoat/allow-scripts
npx --no-install allow-scripts setup
npx --no-install allow-scripts auto
then, edit the allow-list in package.json
after every insstall/reinstall run allow-scripts
run your server or build process in lavamoat-node
npm i -D lavamoat
in your package.json add something like:
"scripts": {
"lavamoat-policy": "lavamoat app.js --autopolicy",
"start": "lavamoat app.js"
run lavamoat-policy every time you make changes to your dependency tree and review the policy (see also: policy override)
run npm start to start your app
Disclaimer: I contribute to LavaMoat and Endo. They are Open Source projects on permissive licenses.

Related

Terraform providers vulnerability detection

Using a lot of (official and non official) terraform providers, I'm looking for a tool to perform security analysis on terraform providers before executing terraform plan/apply commands (and so executing providers code). I want to prevent malicious code from providers to be executed blindly.
I'm basically executing terraform providers mirror command to save local copies of required providers and I'm wondering if I can security scan that result.
I tested kics, checkov and tfsec but they are all looking for security issues in my terraform static code but not in providers.
Do you have any good advices regarding this topic ?
This is actually quite a good question. There are many other problems that can be reduced to same generic question - how to make sure that the thing you downloaded from the internet does not do anything malicious to you like e.g.:
How to make sure that a minecraft plugin does not hack you?
How to make sure that a spring boot dependency does not hack you?
How to make sure that a library xxx you attach to your project does not do harm to you?
Should you use docker image yyy in your project?
Truth is: everything you use has the potential to explode right in your face (or more correctly: right into the face of the system owner). That's why the system owner (usually a company) defines a set of rules to follow what is allowed and what is not allowed. No set of rules you are aware of? Below a set of rules we came up with ourselves when thinking about on-boarding a new library for some projects to use:
Do not take random stuff from github. Take only products with longer history, small bug backlog, little to none past issues in the CVE list, actively maintained.
Do static code analysis yourself. Sometimes it is possible to have tools that work on binaries level do that for you. Sometimes you can do it on source level only. In case of Java libraries, check what tools like Dependency Track think about the library and version you are about to use.
Run the code and see how it works: what does it write, what does it read, what URLs does it communicate with (do a TCP dump if necessary).
Document everything you have done somewhere.
This gives you no 100% confidence that things will not go terribly wrong. But this is a systematic approach that will reduce the risk of doing something stupid.

When should I create my own module package instead of using other packages?

I'm still a new node js developer, currently building a personal project, and I recently found out that there are open source packages available on npm similar to the thing I'm developing.
These packages carry new advanced concepts that I haven't come up with yet and provide more options than I want, but after thinking, it occurred to me why not develop a package that serves me in my project the way I want instead of using packages where I won't use more than 5% of the functions in my project?
Benefits of using an existing, well-supported module:
You save your development time for things that haven't already been written by someone else allowing you to make faster progress on your project
Well tested by the community (pre-tested code saves you lots of time)
Other people finding and fixing bugs (don't underestimate the importance of this)
The code will likely be kept up-to-date as tech changes over time
Possible community of people to ask questions of that knows about that package
Non-issues with using an existing, well-supported module:
Code size is rarely an issue for server-side nodejs development so the fact that a package may contain extra code that you don't need is generally not a practical issue of any consequence. If code size is paramount (like say you were running on a small, embedded system), then nodejs itself might not be the right environment as it's not exactly compact.
Reasons not to use an existing, well-supported module:
You aren't allowed to use open-source code in your project (but then you wouldn't be using nodejs if that was the case).
No existing module does what you want.
Existing modules that do what you want don't appear to be well supported or have many relevant bugs that have been open for a long time. In this case, it still might be worth if for you to clone the repository and use it as a starting point or learning point for your own module.
I'm still a new node js developer, currently building a personal project, and I recently found out that there are open source packages available on npm similar to the thing I'm developing.
IMO, this is part of the magic sauce of doing nodejs development. The huge repository of open source packages (through NPM) that are so easy to use make your development far more productive than developing everything from scratch yourself.
why not develop a package that serves me in my project the way I want instead of using packages where I won't use more than 5% of the functions in my project?
Unused code doesn't really cost you anything of consequence in a server-side environment. If you really wanted you can use bundlers that support tree-shaking which removes the code you're not using.
The question that really matters is whether an existing module meets your needs or is closest enough that you only have to write a little bit of code in order to use it. If that's the case, then the question becomes this: "Why should I use my precious development time to write a package from scratch when I could use far less development time by using something that is already available for free, is already tested and is already proven and then spend that development time (I would have spent developing that package) on other things that advance my product/service further?
In many ways, this is really no different than using the fs module built into nodejs. You use it because it's already developed and already tested and saves you time over developing your own file access module. Yes, the fs module contains lots of code you may never need, but that's not the question. The question is whether it already contains the code you DO need.

how to find all of the options for a node package?

this is a general question about node modules. Everytime I download a node module, I am scrambling online for hints as to what options I can pass into the node module. On github there only seems to be a few options as an example, but what if I want to see what other options are available and what they do? how do I do this? is there a way in the command prompt to see if all of the options exist?
fore example... how would I see the options for this...
https://www.npmjs.com/package/gulp-imagemin
The documentation for every Node module (package) is available on npm, e.g.:
https://www.npmjs.com/package/gulp-imagemin
By default what is displayed is the README.md file in the project. Sometimes it contains the entire documentation, sometimes it has links to other documents or websites.
But sometimes it can be empty or outdated because the modules and their documentation is usually created by people on their free time with no obligation to keep it maintained or well documented.
If there is no documentation available or you think that the documentation is insufficient then you can either post an issue (usually on GitHub) or update the documentation and post a pull request.
See the documentation of a given module to know how to contribute or how to post issues. There should be links to issues and pull requests on the right of the module's page on npm.
I agree with the William with respect to the usabiity of node modules. While most of the modules have 'some' documentation in the npmjs.com, and 'some' in the module's repository (if public, mostly github), there is no standard form in which the capabilities are represented. Also, in many cases, the documentation is not comprehensive.
Ideally I would expect to have a standard template in the npmjs.com with these below details. This would help accelerate the consumption and serviceability of the module when deployed in large and complex software systems.
A high level description of the module.
List of its most common use cases.
List of its most common (and desired) topologies
List of exposed APIs, with their input and expected output, side effects, assumptions.
Tips on debugging the potential issues.
Potential side effects (cache, memory, open fd's, leftover disc files, network access)
People can add / refine items which they think will improve the usabillity of modules, before we take it up with the npm community.

Managing multiple NPM modules browser-side

Okay, I've looked and looked and don't quite see a question that looks like mine, nor a project that quite addresses my need. This is probably because I'm doing something insane, and I am also asking for something difficult. But I wanted to see what others think.
I'm building my first large-scale single-page application. The way I've set it up is by breaking it up into a number of NPM modules. I like this because NPM provides a nice environment to build purely node-run unit tests in, a way to reuse some of my code for other projects we do at my company, and a forced separation of concerns. Here's the general idea:
A core data model library
A core UI library
A psuedo-library that provides individual UI components based on the above two…
…another of those…
…etc for each sub-application of my application
A very small central project that pulls all the components provided by the above together into an interface as necessary
This means a lot of libraries, and a number of dependencies that are common (Underscore, moment, EventEmitter2, etc).
Now I need to figure out how to get all that code into a browser. Ideally, I'd want something with some browserify characteristics (rolls modules and dependencies together into single files to cut down on resource callbacks), but has some of requirejs's asynchronous loading DNA (I'd rather not have to load my entire application up front; being able to call down chunks when the user navigates is useful).
I'm having trouble reconciling the above, though. I get what Require is trying to do, but every time I try to use it for already-built NPM modules (not AMD modules, though I'm happy to write that central project in an AMD-ish way) I get really confused and the sense that it's not really meant for me. For a single page application it seems like it's just going to resolve everything into one file anyway, since my dynamic resources are whole dependencies rather than individual files? And of course, Browserify is made with the sole intent of Hulk Smashing all your code into a single file. I could bundle each NPM module separately with Browserify, but then I'm duplicating the common dependencies for each.
I've looked at a bunch of other projects and they all seem to be addressing the client side more than the bundling side. What am I missing here?
[In pipe dream mode, I also like inject, partly because it's written by LinkedIn (who have a good reputation in my mind), but also for its localStorage caching.]

Modular programming and node

UPDATE 1: I made a lot of progress on this one. I pretty much gave up (at least for now, but maybe long term) on the idea of allowing user-uploaded modules. However, I am developing a structure so that several modules can be defined and loaded. A module will be initialised, set its own routes, and have a 'public" directory for Javascript to be served. The more I see it, the more I realise that I can (should) also move the calls that are now system-wide in a module called "system".
UPDATE 2: I have made HUGE progress on this. I am about to commit tons of code on GitHub which will allow people to do really, really good modular programming (with modules exposing both client and server side code) using Node and Express. Please stay tuned.
UPDATE 3: I rewrote this thing as a system to register modules and enable them to communicate via a event/hooks system. It's coming along extremely nicely. I have tons of code already good to go -- I am just porting it to the new system. Feel free to have a look at the project on GitHub: https://github.com/mercmobily/hotplate )
UPDATE 4: This is good. It turns out that my idea about a module being client AND server is really working.
UPDATE 5: The module is getting closer to something usable. I implemented a new loader which will take into account what an init() function will invokeAll() -- and will make sure that modules providing that hook will be loaded first. This opens up hotplate to a whole new level.
UPDATE 6: Hotplate is now close to 12000 lines of code. By the time it's finished, sometime in February, I imagine it will be close to 20000 lines of code. It does a lot of stuff, and it all started here on StackOverflow! I need it to develop my own SaaS, so I really need to get it finished by February (so that I can sprint to July and finish the first version of BookingDojo). Thanks everybody!
I am writing something that will probably turn into a pretty big piece of software. The short story is that it's nodejs + Express + Mongodb/Mongoose + Dojo (client side).
NOTE: Questions in this text are marked as [Q1], [Q2], etc.
Coming from a Drupal background (and knowing how coooomplex it has evolved, something I would like to avoid), I am a bit of a module freak. At the moment, I've done the application's boilerplate (hotplate: https://github.com/mercmobily/hotplate ). It does all of the boring stuff (users, workspaces, password reminder, etc.) and it's missing quite a few pieces.
I would like to come up with a design that will allow modules in a similar fashion as Drupal (but possibly better). That is:
Modules can define new routes, and handle them
Modules are installed system-wide, and then each workspace can enable a set list of them
The initial architecture could be something along those lines:
A "modules" directory, where there is one directory per module
Each module has a directory for "public" files for the Javascript side of things
Each module would have public/startup.js which would be included in the app's javascript
Each module would have server/node.js which would be included on the fly by the server if/when needed
There would be one route defined, something like /app/:workspaceid/modules/MODULE_NAME/.* with a middleware that checks if that workspace has MODULE_NAME enabled -- and if it does, calls a module's function with the passed parameter
[Q1]: Does this some vaguely sane?
Issues:
I want to make this dynamic. I would like modules to be required when needed on the spot. This should be easy enough to do, by requiring things on the fly.
server/node.js would have a function called, but that function feels/looks an awful lot like a router itself
[Q2] Do you have any specific hints about this one?
These don't seem to be too much of a concern. However, the real question comes when you talk about security.
Privacy. This is a nasty one. At the moment, all the calls will make the right queries to mongoDb filtering by workspaceId. I would like to enforce some way so that there is no clear access to the database by the modules, so that each module doesn't have access to data that belongs to other workspaces
User-defined modules. I would love to give users the ability to upload their own modules (and maybe make them available to other users). But, this effectively means allowing people to upload code that will be executed by node itself! How would you go about this?
[Q3] How would you go about these privacy/security issues? Is there any way for example to run the user-uploaded code in a sort of node sandbox? What about access to file system etc.?
Thanks!
In the end, I answered this myself -- the hard way.
The answer: hotplate, https://github.com/mercmobily/hotplate
It does most of what I describe above. More importantly, with hotPlate (using hotPage and hotClientPages, available by default), you can write a module which
Defines some routes
Defines a "public" directory with the UI
Defines specific CSS and JS files that must be loaded when loading that module
Is able to add route-specific JSes if needed
Status:
I am accepting this answer as I am finished developing Hotplate's "core", which was the point of this answer. I still need to "do" things (for example, once I've written docs, I will make sure "hotplate" is the only directory in the module, without having an example server there). However, the foundation is there. In terms of "core", it's only really missing the "auth" side of the story (which will require a lot of thinking, since I want to make it so that it's db agnostic AND interfacing with passport). The Dojo widgets are a great bonus, although this framework can be used with anything (and in fact backbone-specific code would be sweeeeet).
What hotplate DOESN'T do:
What hotplate DOESn'T do, is give users the ability to upload modules which will then be loaded in the application. This is extremely tricky. The client side wouldn't be so bad (the user could define Javascript to upload, and there could be a module to do that, no worries). The server side, however, is tricky at best. There are just too many things that can go wrong (the client might upload a blocking piece of code, or they could start reading the file system, they would have access to the full database, and so on).
The solution to these issues are possible, but none of them are easy (you can cage the user's node environment and get it to run on a different port, for example, and so on) but some problems will stay. But, there is always hope.

Resources