Tracing Node.js program execution

Tracing Node.js program execution - node.js

I'd like to trace the execution path of an arbitrary Node.js program.
Specifically, I'd like to run a program (server or script), and have some sort of block-level (function call, loop, if statement) trace of the execution.
Constraints
Output must contain files / lines / hit count for all lines during execution
No code minification. Istanbul is great but I want to keep the code that is executed in the end as readable as possible.
For long-running processes (servers, for example), I want to be able to see "current" line coverage (or as up-to-date as possible)
I don't want to lose any coverage data, so while Profiling would give me some hints as to lines hit, it's not really code coverage.
Things I don't care about
Exactly how the coverage is read. For example, it could be output to a file, it could be read via the code, etc.
Coverage format
Thing's I've investigated so far:
Using NODE_V8_COVERAGE:
I found that if I set the NODE_V8_COVERAGE environment variable to a directory, coverage data will be output to that directory when the program exits (here's a blog post on the creation of this feature).
The problem that I'm facing here is that I'm not sure there's a way to trigger the generation of these reports before the program terminates.
Using inspector
I have also been experimenting with Node.js inspector. I found a useful CPU profiler here. This could end up being helpful, but this profiler works by sampling, not as a hook into the language. As a result, I only get line numbers / counts for parts of the code that were slow.
I also tried using Profiler.startPreciseCoverage, thinking that somehow this might give me every line that was executed (didn't find the documentation to be clear on what this does really). It didn't seem to be any more useful
Using Istanbul
I would like to avoid instrumenting code if possible.
Question
It seems like my options are limited, but at the same time this is only a result of my Googling for an hour or two.
Is there a better way to capture line coverage with the constraints listed above?

There's a pull request pending for Node.js to add functionality to programmatically start/stop/write V8 coverage information. If you are adventurous, you could use git to get the version of Node.js you want to use, apply the commits from the patch, and compile a Node.js binary.
If you clone the Node.js repository, the various versions of Node.js are tagged. So you can get the code for Node.js 12.19.0 by checking out the v12.19.0 tag.
You can cherry-pick the commits from the pull request normally, or you could use curl -L https://github.com/nodejs/node/pull/33807.patch | git am to apply the commits as patches.
Instructions for compiling/building the Node.js binary can be found at https://github.com/nodejs/node/blob/master/BUILDING.md#building-nodejs-on-supported-platforms.
More long term, you could chime in on the pull request on whether it meets your needs or not and hopefully get it going again. It seems to have stalled.

Related

Throttling asynchronous events in NodeJS

I tried using NodeJS in a server-side script to parse the text content in local PDF files using pdf-parse, which in turn uses Mozilla's amazing PDF parser. Everything worked wonderfully in my dev sandbox, but the whole thing came crashing down on me when I attempted to use the same code in production.
My problem was caused by the sheer number of PDF files I'm trying to process asynchronously: I have more than 100K files that need processing, and Mozilla's PDF parser is (understandably) unconditionally asynchronous – the OS killed my node process because of too many open files. I had started by writing all of my code asynchronously (the preliminary part where I search for PDF files to parse), but even after refactoring all the code for synchronous operation, it still kept crashing.
The gist of the problem is related to the cost of the operations: walking the folder structure to look for PDF files is cheap, whereas actually opening the files, reading their contents and parsing them is expensive. So Node kept generating new promises for each file it encountered, and the promises were never fulfilled. If I tried to run the code manually on smaller folders, it worked like a charm – really fast and reliable. As soon as I tried to execute the code on the entire folder structure it crashed, no matter what.
I know Node enthusiasts always answer questions like these by saying the OP is using the wrong programming pattern, but I'm stumped as to what would be the correct pattern in this case.

You need to control how many simultaneous asynchronous operations you start at once. This is under your control. You don't show your code so we can just advise conceptually.
For example, if you look at this answer:
Promise.all consumes all my RAM
It shows a function called mapConcurrent() that iterates an array calling an asynchronous function that returns a promise with a maximum number of async operations "in flight" at any given time. You can tune that number of concurrent operations based on your situation.
Another implementation here:
Make several requests to an API that can only handle 20 request a minute
with a function call pMap() that does something similar.
There are other such implementations built into libraries such as Bluebird and Async-promises.

Node.js optimizing module for best performance

I'm writing a crawler module which is calling it-self recursively to download more and more links depending on a depth option parameter passed.
Besides that, I'm doing more tasks on the returned resources I've downloaded (enrich/change it depending on the configuration passed to the crawler). This process is going on recursively until it's done which might take a-lot of time (or not) depending on the configurations used.
I wish to optimize it to be as fast as possible and not to hinder on any Node.js application that will use it.I've set up an express server that one of its routes launch the crawler for a user defined (query string) host. After launching a few crawling sessions for different hosts, I've noticed that I can sometimes get real slow responses from other routes that only return simple text.The delay can be anywhere from a few milliseconds to something like 30 seconds, and it's seems to be happening at random times (well nothing is random but I can't pinpoint the cause).I've read an article of Jetbrains about CPU profiling using V8 profiler functionality that is integrated with Webstorm, but unfortunately it only shows on how to collect the information and how to view it, but it doesn't give me any hints on how to find such problems, so I'm pretty much stuck here.
Could anyone help me with this matter and guide me, any tips on what could hinder the express server that my crawler might do (A lot of recursive calls), or maybe how to find those hotspots I'm looking for and optimize them?

It's hard to say anything more specific on how to optimize code that is not shown, but I can give some advice that is relevant to the described situation.
One thing that comes to mind is that you may be running some blocking code. Never use deep recursion without using setTimeout or process.nextTick to break it up and give the event loop a chance to run once in a while.

Shake: Signal whether anything had to be rebuilt at all

I use shake to build a bunch of static webpages, which I then have to upload to a remote host, using sftp. Currently, the cronjob runs
git pull # get possibly updated sources
./my-shake-system
lftp ... # upload
I’d like to avoid running the final command if shake did not actually rebuild anything. Is there a way to tell shake “Run command foo, after everything else, and only if you changed something!”?
Or alternatively, have shake report whether it did something in the process exit code?
I guess I can add a rule that depends on all possibly generated file, but that seems to be redundant and error prone.

Currently there is no direct/simple way to determine if anything built. It's also not such a useful concept as for simpler build systems, as certain rules (especially those that define storedValue to return Nothing) will always "rerun", but then very quickly decide they don't need to run the rules that depend on them. To Shake, that is the same as rerunning. I can think of a few approaches, which one is best probably depends on your situation:
Tag the interesting rules
You could tag each interesting rule (one that produces something that needs uploading) with a function that writes to a specific file. If that specific file exists, then you need to upload. This might work slightly better, as if you do multiple Shake runs, and in the first something changes but the second nothing does, the file will still be present. If it makes sense, use an IORef instead of a file.
Use profiling
Shake has quite advanced profiling. If you pass shakeProfile=["output.json"] it will produce a JSON file detailing what built and when. Runs are indexed by an Int, with 0 for the most recent run, and any runs that built nothing are excluded. If you have one rule that always fires (e.g. write to a dummy file with alwaysRerun) then if anything fired at the same time, it rebuilt.
Watch the .shake.database file size
Shake has a database, stored under shakeFiles. Each uninteresting run it will grow by a fairly small amount (~100 bytes) - but a fixed size given your system. If it changes in size by a greater amount, then it did something interesting.
Of these approaches, tagging the interesting rules is probably the simplest and most direct (although does run the risk of you forgetting to tag something).

Cucumber: Each feature passes individually, but not together

I am writing a Rails 3.1 app, and I have a set of three cucumber feature files. When run individually, as with:
cucumber features/quota.feature
-- or --
cucumber features/quota.feature:67 # specifying the specific individual test
...each feature file runs fine. However, when all run together, as with:
cucumber
...one of the tests fails. It's odd because only one test fails; all the other tests in the feature pass (and many of them do similar things). It doesn't seem to matter where in the feature file I place this test; it fails if it's the first test or way down there somewhere.
I don't think it can be the test itself, because it passes when run individually or even when the whole feature file is run individually. It seems like it must be some effect related to running the different feature files together. Any ideas what might be going on?

It looks like there is a coupling between your scenarios. Your failing scenario assumes that system is in some state. When scenarios run individually system is in this state and so scenario passes. But when you run all scenarios, scenarios that ran previously change this state and so it fails.
You should solve it by making your scenarios completely independent. Work of any scenario shouldn't influence results of other scenarios. It's highly encouraged in Cucumber Book and Specification by Example.

I had a similar problem and it took me a long time to figure out the root cause.
I was using #selenium tags to test JQuery scripts on a selenium client.
My page had an ajax call that was sending a POST request. I had a bug in the javascript and the post request was failing. (The feature wasn't complete and I hadn't yet written steps to verify the result of the ajax call.)
This error was recorded in Capybara.current_session.server.error.
When the following non-selenium feature was executed a Before hook within Capybara called Capybara.reset_sessions!
This then called
def reset!
driver.reset! if #touched
#touched = false
raise #server.error if #server and #server.error
ensure
#server.reset_error! if #server
end
#server.error was not nil for each scenario in the following feature(s) and Cucumber reported each step as skipped.
The solution in my case was to fix the ajax call.
So Andrey Botalov and Doug Noel were right. I had carry over from an earlier feature.
I had to keep debugging until I found the exception that was being raised and investigate what was generating it.
I hope this helps someone else that didn't realise they had carry over from an earlier feature.

Can I initiate Intellij IDEA indexing as the last step of a nightly build?

During our nightly builds, we pull down the latest committed checkins from a multi-site source control repository and merge our local source changes on top and compile/build. This leaves us with modified dates and content on many files. When I arrive the next morning and click in my Intellij IDEA window, IDEA rebuilds the index over the source files. With Intellij IDEA 10, this happens in the background and the speed is supposed to be much better. While waiting, I can do many (but not all) operations on the source.
Doing the indexing in the background is great, but can I run a command to make Intellij IDEA reindex the file, as the last step of my nightly build? That way, the reindexing is complete and ready before I get to the office.
I suppose killing and restarting IDEA would work, but seems a bit harsh, and I'd want to be certain no edits were unsaved at the time. FYI, running on Debian Linux.
Thanks,
Alan

Actually IDEA should detect external changes automatically via fsnotifier and perform indexing of the modified files. If it doesn't happen you can use File | Synchronize. There is also Settings | General | Synchronize files on frame activation, minimizing and restoring IDEA window should force files synchronization when this option is enabled.
External tool which will find IDEA window, minimize it and restore it will force synchronization, run it as your last build step.
You can also write a simple IDEA plug-in which will listen on some TCP port and invoke Synchronize action. Then make a tool which connects to this port and sends some command to force synchronization outside of IDEA. Run this tool as the last build step.
With such plug-in you will have more control over IDEA and it would be possible to invoke other actions if necessary, like restart IDEA, open project, etc.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string