How to speed up the build of front-end monorepo projects? - frontend

Here are some of my current thoughts, but would also like to understand how the community is doing it now
Use caching to reduce build time
Motivation
Currently, every time you change something in the libs, someone else needs to initialize it again when pulling it through git. If you know which package it's in, you can just run the initialize command for the specified package. If you don't know, you need to run the root initialize command, which is actually very slow because it re-runs all npm packages that contain the intialize command, regardless of whether they have changed or not.
reduce initialize time and improve collaborative development experience
support for ci/cd caching of built libs to speed up build time
Current state
lerna run
First initialization of new project 198.54s
Re-initialize for non-new projects 90.17s
Requirements
Execute commands as concurrently as possible according to the dependency graph, and implement caching based on git changes
Execute commands in modules that specify module dependencies
Execute commands in all submodules
Implementation ideas
lerna api is currently severely undocumented: https://github.com/lerna/lerna/issues/2013
scan lerna.json for all modules
run the specified commands in parallel as much as possible according to the dependencies
check git when running the command to see if it needs to be skipped
Schematic: https://photos.app.goo.gl/aMyaewgJxkQMhe1q6
The ultra-runner already exists, so maybe it's better to support this feature based on it
There seems to be no support for running commands other than build at the moment, see: https://github.com/folke/ultra-runner/issues/224
Known solutions
Trying to use Microsoft rush -- hoping for a better solution without replacing the main framework lerna+yarn

Related

NodeJS CI/CD Build Missing Files

I am setting up CI/CD for a NodeJS project and occasionally the developer forgets to send up a file (module) to source control. I run npm ci and npm test without problem and the application gets deployed to my server. However, it will error out once executed due to the missing module.
Is there a best practice for ensuring that all files required by a node application are available before allowing it to be deployed?
I don't know your exact configuration but here is how my teams have handled similar issues in the past:
Unit Tests. Normally, a CI system can catch this type of error before you deploy. If your CI tests aren't flagging your missing module before the code gets deployed, then a commonly used solution would be to write a test that ensures the module is present. That way the problem would be automatically caught when your unit tests are run. Something along these lines might work:
// mymodule.test.js using Mocha syntax:
import {expect} from 'chai';
import mymodule from './mymodule';
describe('my module', () => {
it('should export something!', () => {
expect(!!mymodule).to.be.true;
});
});
Version Control workflow. It sounds like there is also an issue here with your team's version control workflow. Generally, all of the files required to build the production application should be kept under version control and developers should be committing frequently. In this situation, I would normally do some investigation to see what is happening -- it might require training or perhaps the app is structured in a way that is overly confusing or complex to engineers.
Use a lockfile for npm packages. If the missing module is a npm modules, then there are a few things that could cause it to be missing. Generally, all npm modules should be listed in your package-lock.json or yarn.lock file. This ensures that the production version of the application will be in sync with what developers are using locally. I personally discourage developers from installing npm modules globally unless it is absolutely necessary. In that situation, your CI server (and perhaps your production server) will need to be updated to include the exact same versions of the global packages.
Automated build systems. I think you are indicating that your problem is caused by modules that aren't under source control. But I have also seen some situations where legacy build systems might omit a file that is important when building the app for production. Modern build tools like Webpack and Babel normally include every module referenced by the application but older solutions like grunt and gulp might require some fine-tuning to ensure the files are always included in an automated way (to avoid a situation where developers are expected to manually add the modules to the build system and often forget to do so).
The best way to prevent this is to have the developer get the project's checksum and compare that with source control and/or your server. If the checksums match, then all of the files were transfered.
If you (or your deployment software/service) are using rsync in your deployment process and use the -C parameter then some directories might be getting filtered out.
I had a similar CI/CD issue where npm packages used the directory name "core" and it was ignored due to the -C parameter.
Replace all of your require() calls with webpack import calls, and have your build run webpack. At runtime, node will run the bundle instead of your regular entrypoint.
Webpack will catch all unreachable imports during build time.
All this is assuming the missing files are modules (code) rather than resources (e.g. JSON files).

How can I run a script before install a new nodejs dependecy

I try using preinstall npm scripts, but it only run when I checkout the project into a new space, and run "npm i" standalone
I need a solution to run a script before the new dependency writed into package.json. It doesn't depend the type of dependency: dev or prod. All of them need to check.
For example, when a new developer join to the team, and want to add new dependency which has known vulnerability, this script stops the action before the package.json was changed, and show warning message for the developer
There isn't a way to do that with npm scripts. So, unless you feel like implementing one you're going to have to adjust your process. Start by identifying all the problems you're trying to address with an on-dependency-install hook.
You give the example of preventing the installation of a dependency or dependency version. That's not a problem: it's a solution you've identified for the problem. Figure out what the actual problem is, and then reevaluate your solution to see if it's really the most appropriate measure to take.
Possibly (probably) you are afraid of vulnerable code making it up to production. That's a problem definition you can work with. What possible solutions exist? You've already identified the blacklist. But not only is that not supported by your tooling, even if it were the onus is then on you to keep the blacklist up to date. Given just how quickly the Node world moves, that's enough work to keep several people employed fulltime. And that's not even getting into deploying it to your developers.
The good news is that that's not the only solution: you could establish procedural safeguards against integrating vulnerable code. If you're using a distributed VCS like Git, pull requests are right there: disable pushing commits to the master or development branch, have developers work in feature branches and submit pull requests, then review those pull requests and screen any new dependencies for vulnerabilities when they show up. If you're using something like SVN, you can use feature branches with code reviews to similar effect. Your developers get extra eyes on their code looking for vulnerabilities, optimizations, edge cases, and so forth; you don't waste time screening dependencies that nobody ever tries to integrate. And nobody has to worry about getting the latest copy of the blacklist. For this particular scenario, everybody wins with a process solution over a technical solution.
If you have other reasons for wanting to fire scripts when dependencies are installed, try working back to the root of the problem the same way. The way Node dependency management and module interactions work, you'll probably discover it's preferable to develop better process habits.
If you are using git, you can use pre-commit/push hooks, the result is pretty much the same, no vulnerabilities in code base.
For exemple with husky and nsp you could do something like this :
{
"scripts": {
"prepush": "nsp check"
}
}
Riffing off Gabriel's suggestions, since you are concerned about devs wasting time when the lib they add fails an nsp check... You can use an editor extension to run the nsp check as they code. Then have Husky do a pre-commit nsp check as well.
I would also recommend Greenkeeper.io to prevent vulnerabilities before they are found.
If the main concern is that these vulnerable packages are being run within your network (since there's no way to prevent those devs from using those packages in general), you could mirror a subset of the npm registry that you consider safe, or manually add known safe dependencies to that mirror, and block access to the main registry https://registry.npmjs.org/ at the network level. This would mean your developers are stuck waiting for the mirror to be updated, but would require somebody to at least stop and think before they're able to install a problematic module.

When to add a dependency? Are there cases where I should rather copy the functionality?

I lately helped out on a project, where I added a really small dependency - in fact, it only contained a regular expression (https://www.npmjs.com/package/is-unc-path).
The feedback I got from the developer of the project was that he tries to minimize third-party dependencies if they can be implemented easily - whereby he - if I understand it correctly - asks me to just copy the code instead of adding another dependency.
To me, adding a new dependency looks just like putting some lines of code into an extra file in the repo. In addition, the developers will get informed by an update if the code needs a change.
Is it just a religious thought that drives a developer to do this? Are there maybe any costs (performance- or space-wise, etc) when adding a dependency?
I also had some disputes with my managers once concerning the third party libraries, the problem was even greater he got into believing that you should version the node_modules folder.
The source of any conflict usually is the ignorance.
His arguments were:
you should deliver to the client a working product not needing for him to do any other jobs like npm install
if github, npm is down in the moment when you run npm install on the server what you will do ?
if the library that you install has a bug who will be responsible
My arguments were:
versioning node_modules is not going to work due to how package dependencies work, each library will download his own node_modules dependencies and then your git repository will grow rapidly to hundreds of mb. Deploy will become more and more slow, downloading each time half a gb of code take time. NPM does use a module caching mechanism if there are no changes it will not download code uselessly.
the problem with left-pad was painfull but after that npm implemented a locking system and now for each package you just lock to a specific commit hash.
And Github, and npm does not have just a single instance service, they run in cloud.
When installing a dependency you always have some ideas behind and there are community best practices, usually they resume to: 1. Does the repo has unit tests. 2. The download number 3. When was the latest update.
Node.js ecosystem is built on modularity, it is not that node is so popular cause of some luck, but cause of how it was designed to create modules and reuse them. Sometimes working in node.js environment feels like putting lego pieces together and building your toy. This is the main cause of super fast development in node.js. People just reuse stuff.
Finally he stayed on his own ideas, and I left the project :D.

Does the case for not including Node modules in version control also apply to Composer packages?

In doing research on whether Node's node_modules should be checked into your version control repository, the most recent consensus seems to be that you should include it for applications you deploy.
Sources:
Check in node_modules vs. shrinkwrap
Should I check in node_modules to git when creating a node.js app on Heroku?
https://www.npmjs.org/doc/misc/npm-faq.html#Should-I-check-my-node_modules-folder-into-git
In reading these arguments, it lead me to question whether Composer's /vendors directory should also be checked into version control. Composer's documentation suggests that you don't:
Should I commit the dependencies in my vendor directory?
The general recommendation is no. The vendor directory [...] should be added to .gitignore/svn:ignore/etc.
The best practice is to then have all the developers use Composer to install the dependencies. Similarly, the build server, CI, deployment tools etc should be adapted to run Composer as part of their project bootstrapping.
While it can be tempting to commit it in some environment, it leads to a few problems:
Large VCS repository size and diffs when you update code.
Duplication of the history of all your dependencies in your own VCS.
[...]
Contrasting that argument is this one (source):
Doesn’t checking in node_modules create a lot of noise in the source tree that isn’t related to my app?
No, you’re wrong, this code is used by your app, it’s part of your app, pretending it’s not will get you in to trouble. You depend on other people’s code and they are just as likely to write new bugs as you are, probably more so. Checking all of that code in to source control gives you a way to audit every line that ever changed in your application. It allows you to use $ git bisect locally and be ensured that it’s the same as in production and that every machine in production is identical. No more tracking down unknown changes in dependencies, all the changes, in every line, are viewable in source control.
In summary, the question is this: Why would one gitignore (i.e. not version control) node_modules but not do the same for Composer's vendor/ directory?
The reason to commit external dependencies is
it's easier to deploy with git pull
the code used is directly included in the commit anyone checks out
Arguments against this are
Git is no deployment tool
all dependency managers do have a way to make exactly sure the code used can be fetched
I don't know anything about npm, but for Composer that last point is implemented by committing composer.lock.
I don't think the "audit code" argument is a valid one in every case. If you do write software that get's audited by itself, and subsequently needs all libraries to be audited, then probably all code changes need to be conserved. This isn't true for the general case.
git bisect works still as well with a committed composer.lock. It does require installing the dependencies with every bisect step, but this is just one simple step, probably already done in the automatic test suite run.
The last thing to worry about is when used packages go offline. This really is more of a problem with Composer, because there is no central hosting of the downloadable releases (npm probably does this). If this is a problem, either commit the code (and try to figure out how to update that missing package in the future), or setup an instance of Satis to create a local copy of the code you use.
Putting all your modules in you VCS makes it really heavy to download or upload. For example, I work on two node.js projects and the total node_modules directories size is between 250MB and 500MB whereas the whole code with assets is generally less than 40MB. Every Node.js developer likes Node lightness, so the code must stay easy to download and share.
For the second point, an alternative to avoid regressions is to be more restrictive in your package.json dependencies versions. You will find more information here.
Finally the best argument is to take a look on the famous modules everybody know :
express ignores
node_modules
mocha ignores
node_modules
q ignores node_modules
...
The more I research this the more I'm starting to think that there is no one correct answer to this, just different opinions as well as pros and cons of each method.
This blog, looking from the context of Bower, does a good job of weighing the pros and cons of each: http://addyosmani.com/blog/checking-in-front-end-dependencies/.
In short: At least for right now, weigh the pros and cons and decide what best fits your situation.

RPM - Install time parameters

I have packaged my application into an RPM package, say, myapp.rpm. While installing this application, I would like to receive some inputs from the user (an example for input could be - environment where the app is getting installed - "dev", "qa", "uat", "prod"). Based on the input, the application will install the appropriate files. Is there a way to pass parameters while installing the application?
P.S.: A possible solution could be to create an RPM package for each environment. However, in our scenario, this is not a viable option since we have around 20 environments and we do not wish to have 20 different packages for the same application.
In general, RPM packages should not require user interaction. Time and time again, the RPM folks have stated that it is an explicit design goal of RPM to not have interactive installs. For packages that need some sort of input before first use, you typically ask for this information on first use, our you put it all in config files with macros or something and tell your users that they will have to configure the application before it is usable.
Even passing a parameter of some sort counts as end-user interaction. I think what you want is to have your pre or install scripts auto detect the environment somehow, maybe by having a file somewhere they can examine. I'll also point out that from an RPM user's perspective, having a package named *-qa.rpm is a lot more intuitive than passing some random parameter.
For your exact problem, if you are installing different content, you should create different packages. If you try to do things differently, you're going to end up fighting the RPM system more and more.
It isn't hard to create a build system that can spit out 20+ packages that are all mostly similar. I've done it with a template-ish spec file and some scripts run by make that will create the various spec files and build the RPMs. Without knowing the specifics, it sounds like you might even have a core package that all 20+ environment packages depend on, then the environment specific packages install whatever is specific to their target environment.
You could use the relocate option, e.g.
rpm -i --relocate /env=/uat somepkg.rpm
and have your script look up the variable data from a file located in the "env" directory
I think this is a very valid question, specially as soon as you are moving into the application development realm. There he configuration of the application for different target systems is your daily bread: you need to configure for Development, Integration Test, Acceptance Test, Production etc. I sure don't think building a seperate package for each enviroment is the solution. Basically it should be the same code running in different enviroments.
I know that this requirement is not supported by rpm. But what you can do as a work around is to use a simple config file, that the %pre script knows
to look for. The config file could be a simple shell script that for example sets environment variables, and then the different und pre and post scripts can use those.

Resources