Node.JS: test code vs. production code organization - node.js

From a cursory look in several node.js projects on Github I noticed that the common convention is to put test files under a ./spec directory (exact name may vary: ./tests, ./specs, etc.). Let's call this the "classic" project organization.
On the other hand, there is also (at least theoretically) the "localizing" organization: each test file is in the same directory as the production file it tests (e.g., under ./controllers we will have login_controller.js as well as login_controller.spec.js).
In order to avoid theological battles on this clearly subjective topic I will ask concrete questions:
Has anyone saw major modules/apps using the localizing organization?
Are there hard drawbacks/limitations to the localizing organization? by "hard" I mean something along the lines of "well, Heroku does not include the specs/ directory in its deployment bundle (a.k.a slug) so the classic organization has a smaller footprint on the server".
Are there testing frameworks (Mocha, jasmine-node, and co.) that somehow impose the "classic" scheme?

No, but that is up to your organizational preference. I personally would prefer a test directory.
Nope. Heroku includes everything in your slug that it receives, the only things that are excluded are the ones excluded from git in your .gitignore file.
Not 100% sure, but generally no, test frameworks do not impose a structure on your code. They simply provide the tools for you to write your tests using the structure you want.

Consistency here is more important than the direction you decide to go. There are a few relatively minor issues that I see going with test files next to source files.
Potential code navigation issues. I could see you sometimes opening
the wrong file on accident. Opening the test file when you mean to
open the source file, etc. This would happen more often if those
files are side by side and the only difference is .spec in the
filename.
Potential issues with unit test runner. Most unit test runners seem
to prefer a folder of tests by default. I'm sure you could configure
them to look across the entire project, but it depends on the test
runner.
Potentially slower unit test automation. Since your test files are
mixed across your entire project, the test runner will have to scan
your entire project for test files instead of a dedicated directory.
For a large code base, this could mean it takes longer for your test
suite to complete. Chances are the difference in speed is quite
small however.
As I said, these are minor issues and you can definitely work around them, but it does add some friction. You have to weigh the potential downsides with what you see as the benefits of having your test files reside next to your source files.

Related

Maintaining a service required by two different apps

I have two node apps running on my server, each performing different tasks.
However, I now need to create a service that is going to be used by both of them. Obviously I don't want to create it in both of the apps, hence creating two codes to maintain.
My current thought is to have a separate repository only for this service, then require it from each app as an outsourced module.
I was wondering whether there are better methods, or if this method might encounter problems I'm not seeing
Well if you strictly follow the rule that shared means only shared things in the common package, I don't see any issues with that. The problem comes when you try to put the logic in one repo which is supposed to be only used for one. In that scenario you will need to rebuilt both apps as the repo or package is depedency of both.
One issue that I have seen people face is when they work with shared repo is that when you need to tweak things just because they are at common place. for example you have a method that does one job and suddenly you want to use that in other place as well but with a tweak. In that case you end up modifying the shared code to support the second repo but since it is shared, you will have to do regression testing of both apps.
I see shared repo candidates being drivers, client etc. I guess rest is up to your project structure and judgement. In this case there is nothing correct or incorrect. Hope this is making sense.

Partial packages in Continuous Delivery

Currently we are running a C# (built on Sharepoint) project and have implemented a series of automated process to help delivery, here are the details.
Continuous Integration. Typical CI system for frequent compilation and deployment in DEV environment.
Partial Package. Every week, a list of defects accompanied fixes is identified and corresponding assemblies are fetched from the full package to form a partial package. The partial package is deployed and tested in subsequent environments.
In this pipeline, there are two packages going through are being verified. Extra effort is used to build up a new system (web site, scripts, process, etc) for partial packages. However, some factors hinder its improvement.
Build and deploy time is too long. On developers' machines, every single modification on assemblies triggers around 5 to 10 minute redeployment in IIS. In addition, it takes 15 minutes (or even more) to rebuild the whole solution. (The most painful part of this project)
Geographical difference. Every final package is delivered to another office, so manual operation is inevitable and package size is preferred to be small.
I will be really grateful to have your opinions to push the Continuous Delivery practices forward. Thanks!
I imagine the reason that this question has no answers is because its scope is too large. There are far too many variables that need to be eliminated, but I'll try to help. I'm not sure of your skill level either so my apologies in advance for the basics, but I think they'll help improve and better focus your question.
Scope your problem to as narrow as possible
"Too long" is a very subjective term. I know of some larger projects that would love to see 15 minute build times. Given your question there's no way to know if you are experiencing a configuration problem or an infrastructure problem. An example of a configuration issue would be, are your projects taking full advantage of multiple cores by being built parallel /m switch? An example of an infrastructure issue would be if you're trying to move large amounts of data over a slow line using ineffective or defective hardware. It sounds like you are seeing the same times across different machines so you may want to focus on configuration.
Break down your build into "tasks" and each task into the most concise steps possible
This will do the most to help you tune your configuration and understand what you need to better orchestrate. If you are building a solution using a CI server you are probably running using a command like msbuild.exe OurProduct.sln which is the right way to get something up and running fast so there IS some feedback. But in order to optimize, this solution will need to be broken down into independent projects. If you find one project that's causing the bulk of your time sink it may indicate other issues or may just be the core project that everything else depends on. How you handle your build job dependencies is dependent up your CI server and solution. Doing it this way will create more orchestration on your end, but give faster feedback if that's what's required since you're only building the project that had the change, not the complete solution.
I'm not sure what you mean about the "geographical difference" thing. Is this a "push" to the office or a "pull" from the offices? This is a whole other question. HOW are you getting the files there? And why would that require a manual step?
Narrow your scope and do multiple questions and you will probably get better (not to mention shorter and more concise) answers.
Best!
I'm not a C# developer, but the principles remain the same.
To speed up your builds, it will be necessary to break your application up in smaller chunks if possible. If that's not possible, then you've got bigger problems to attack right now. Remember the principles of API's, components and separation of concerns. If you're not familiar with these principles, it's definitely worth the time to learn about them.
In terms of deployment - great that you've automated it, but it sounds exactly the same as you are building a big-bang deployment. Can you think of a way to deploy only deltas to the server(s), are do you deploy a single compressed file? Break it up if possible.

How to prevent malicious *.js scripts from executing in Node.js

I'm using Node.js to create the web service. In the implementation, I consumed many third party modules which are installed via npm. There is security issue if there is malicious *.js scripts in the consumed modules. For example, the malicious code may delete all my disk files, or collect the secret data in silence.
I have a couple of questions regarding this.
How to detect if there is security issue in the module?
What should I do to prevent malicious *.js scripts from executing in Node.js?
I'm very appreciate if you can share any experience to build the node.js service.
Thanks,
Jeffrey
One concern you did not raise is that a module might try to make a direct connection to your database itself, or to other services on your internal network. This might be prevented by setting passwords which the module cannot find so easily.
1. Restricting disk access
This project was presented at NodeConf last year. It attempts to restrict filesystem access in precisely the situation you describe.
https://github.com/yahoo/fs-lock
"The goal for this module is to help when you are loading 3rd party modules and you need to restrict their access."
It sounds rather like the proposal Jeffrey made in the comments in Plato's answer.
(If you want to look further into hooking OS calls, this hookit project may present a few ideas. Although in its current form it only wraps the callback function, it might provide inspiration of what to hook, and how. Here is an example of it being used.)
2. Analyse flow of sensitive data
If you are only worried about data-stealing (not filesystem or database access), then you can focus your concerns:
You should be most concerned about those packages which are being passed sensitive data. Presumably some of the data on your web-service is presented to the public anyway!
Most packages will not have access to the full stack of your application, only the bits of data you pass them. If a package is only being passed a small amount of sensitive data, and never passed the rest of the data, it may not be able to do anything malicious with the data it receives. (For example, if you pass all your usernames to one package for processing and all your addresses to a different package, that is a much smaller concern than if you pass all your usernames, addresses and credit-card numbers to the same package!)
Identify the sensitive data in your app, and note which functions in which modules they are passed to.
3. Perform efficient code review
You may not need to go to Github to read the code. The great majority of packages provide all their source-code in their install folder inside node_modules. (There are a few packages which provide binaries however; these are naturally harder to verify.)
If you do want to check the code yourself, there may ways to reduce the amount of work involved:
To secure your own app, you do not need to read the entire source code of all packages in your project. You only need to review those functions which are actually called.
You may trace the code by reading it, or with the aid of a text-based debugger, or a GUI debugger. (Of course you should look out for branching, where different inputs may cause different parts of the module to be called.)
Set breakpoints when you call into a module which you don't trust, so you can step through the code that is called and see what it does. You may be able to conclude that only a small part of the module is used, so only that code needs to be verified.
Whilst tracing flow should cover concerns about sensitive data at runtime, to check for file access or database access, we should also look at the initialisation code of each module which is required, and all calls (including requires) which are made from there.
4. Other measures
It might be wise to lock the version number of each package in package.json so that you don't accidentally install a new version of a package until you decide that you need to.
You may use social factors to build confidence in a package. Check the respectability of the author. Who is he, and who does he work for? Do the author and his employers have a reputation to uphold? Similarly, who uses his project? If the package is very popular, and used by industry giants, it is likely that others have already reviewed the code.
You may wish to visit github and enable notifications for all the top-level modules you are using, by "watching" the repository. This will inform you if any vulnerabilities are reported in the package in future.
Most (all?) modules have source code available on Github, you can read through the source and look for security problems, or hire a security professional to do the job.
I just take the risk - although I tend to use popular packages with hundreds of commits, active maintenence, and issue lists.
If your project dependency tree is large enough, reviewing all of your dependencies is not a feasible long-term strategy.
The original answer from Joey has some good countermeasures you can use for specific scenarios. I've also seen https://github.com/berstend/node-safe - could make you slightly safer on mac.
A general solution to the problem is taking shape though.
How to protect a project from malicious packages
make sure you don't run lifecycle (postinstall) scripts unless they're known and necessary (see my talk on this topic)
put 3rdparty code in a compartment, lock down the environment, decide on which powerful APIs to pass to each package.
The second step requires the use of Compartment, which is a work-in-progress in TC39 https://github.com/tc39/proposal-compartments/
But a shim exists. And Some tooling was built on top of that shim.
You could use the SES-shim directly and implement your own controls, or use the convenience of LavaMoat
LavaMoat lets you generate and tweak a per-package policy where you can decide which globals and builtins it should have access to.
LavaMoat also offers a tool to manage install scripts.
Here's my talk on SES and LavaMoat with a demo at the end.
How to set up LavaMoat
See LavaMoat docs for more details
disable/allow dependency lifecycle scripts (eg. "postinstall") via #lavamoat/allow-scripts
npm i --ignore-scripts -D #lavamoat/allow-scripts
npx --no-install allow-scripts setup
npx --no-install allow-scripts auto
then, edit the allow-list in package.json
after every insstall/reinstall run allow-scripts
run your server or build process in lavamoat-node
npm i -D lavamoat
in your package.json add something like:
"scripts": {
"lavamoat-policy": "lavamoat app.js --autopolicy",
"start": "lavamoat app.js"
run lavamoat-policy every time you make changes to your dependency tree and review the policy (see also: policy override)
run npm start to start your app
Disclaimer: I contribute to LavaMoat and Endo. They are Open Source projects on permissive licenses.

Managing multiple NPM modules browser-side

Okay, I've looked and looked and don't quite see a question that looks like mine, nor a project that quite addresses my need. This is probably because I'm doing something insane, and I am also asking for something difficult. But I wanted to see what others think.
I'm building my first large-scale single-page application. The way I've set it up is by breaking it up into a number of NPM modules. I like this because NPM provides a nice environment to build purely node-run unit tests in, a way to reuse some of my code for other projects we do at my company, and a forced separation of concerns. Here's the general idea:
A core data model library
A core UI library
A psuedo-library that provides individual UI components based on the above two…
…another of those…
…etc for each sub-application of my application
A very small central project that pulls all the components provided by the above together into an interface as necessary
This means a lot of libraries, and a number of dependencies that are common (Underscore, moment, EventEmitter2, etc).
Now I need to figure out how to get all that code into a browser. Ideally, I'd want something with some browserify characteristics (rolls modules and dependencies together into single files to cut down on resource callbacks), but has some of requirejs's asynchronous loading DNA (I'd rather not have to load my entire application up front; being able to call down chunks when the user navigates is useful).
I'm having trouble reconciling the above, though. I get what Require is trying to do, but every time I try to use it for already-built NPM modules (not AMD modules, though I'm happy to write that central project in an AMD-ish way) I get really confused and the sense that it's not really meant for me. For a single page application it seems like it's just going to resolve everything into one file anyway, since my dynamic resources are whole dependencies rather than individual files? And of course, Browserify is made with the sole intent of Hulk Smashing all your code into a single file. I could bundle each NPM module separately with Browserify, but then I'm duplicating the common dependencies for each.
I've looked at a bunch of other projects and they all seem to be addressing the client side more than the bundling side. What am I missing here?
[In pipe dream mode, I also like inject, partly because it's written by LinkedIn (who have a good reputation in my mind), but also for its localStorage caching.]

Should CruiseControl.NET be used to handle tasks that are not related to building source?

Weird question, perhaps. We have a number of simple utilities written in-house that need to be run on an automated basis. These are not build jobs. Just things like running SendOutHourlyEmailAlarms.exe, KeepFoldersInSynch.exe and such. I would normally set these things up as simple scheduled tasks/AT commands (or a Windows Service if more granular control is needed over the scheduling), but a co-worker has set up a number of these tasks as build projects on the CruiseControl.NET server. I asked him why he set these up this way and his response was that the executions (and their logs, return values, thrown exceptions) were all tracked and logged and that this information was accessible through an organized interface on the build server website. I couldn't argue with this.
But this just has a smell that I can't quite identify. Is this a proper use of CruiseControl.NET? If not, what are the dangers? Even if it may fit the bill, aren't there other products better suited for this type of thing?
We have all sorts of non-build related tasks for the exact same reason as your coworker had, I want one spot to look up any and all jobs I need run.
Some Examples of our CC.NET projects:
FTP installers to Remote QA
Creating Source Code Documentation
Create VM's with the installers
installed for QA in the morning
Archiving Installers
Pretty much anything I have to do by hand more than once, becomes a project. IMHO it is much better than a scheduled task for one other reason as well. Our config files are in source control, so we have 1 place to make adjustments. We do not have to log into multiple servers and make adjustments or wonder which server did that.
I think your coworker has made a good argument. If these tasks are related to the development process, then placing them in CruesControl.Net as a project seems acceptable. I would draw the line at utilizing a development server to run production processes though. Although it is true that "If the only tool you have is a hammer, you tend to see every problem as a nail," it doesn't mean that the hammer isn't capable of solving a lot of problems!
Just because a tool is designed to solve a particular problem does not mean that it will not have equal facility at solving similar problems outside the scope originally concieved by the tool creator. If CruiseControl.NET solves these problems well, then it is absolutely the appropriate tool to use.

Resources