How to trigger profiling in NodeJS at runtime? - node.js

We have a very stateful NodeJS based web server (Meteor) that occasionally, randomly becomes slow in production. The problem is not reproducible in any of our tests, and we don't know what's triggering it.
To diagnose this, we are using the v8-profiler package. This lets us trigger a 10-second CPU profile and download it for offline analysis.
Despite not having received any commits in 3 years, the package used to work fairly well. It has given us compilation trouble in the past, and now it looks like it stopped compiling entirely, breaking our build. The build happens inside a Docker container with all versions pinned, including NodeJS and v8-profiler itself, so it's unlikely that we can fix this on our end.
I'm thinking there must be some alternative, better maintained approach. But where is it?
(Note that restarting the server with additional flags (like --profile) is not an option, because it destroys all the evidence of the problem.)

I found there has been v8-profiler-next which is a successor of v8-profiler.
I hope this works for you.

I just built a tool for this. Called ntop, so it's like "top" but for Node apps https://github.com/DVLP/ntop
The below will enable communication with the CLI. This is designed to not add any overhead when the CLI tool is not used so it can be used in production. The profiler connects/disconnects immediately only when the CLI is doing the profiling.
The app:
import * as ntop from 'ntop'
ntop()
CLI shortcut to get a list of PIDs for convenience:
npx ntop
Outputs PIDs and additionally the command used to create the process for easier recognition.
Process detected at 12345 Details: node ./src/index.js --port 8216
npx ntop 12345
Outputs a list like "Bottom Up" in Chrome Dev Tools
(garbage collector) | 16.101ms |
shift | 10.038ms | node:internal/priority_queue:98:7
(anonymous) | 9.192ms | file:///home/app/src/controllers/Server.js:24:29
utils.bulkPreparePacket | 4.924ms | file:///home/app/src/Utils.js:91:26
preparePacket | 4.776ms | file:///home/app/src/Model.js:98:54
baseGetTag | 1.727ms | file:///home/app/node_modules/lodash/lodash.js:3104:23
(anonymous) | 1.702ms | evalmachine.:3:14
isPrototype | 1.441ms | file:///home/app/node_modules/lodash/lodash.js:6441:24
(program) | 1.411ms |
percolateDown | 1.124ms | node:internal/priority_queue:40:15

Related

`docker-compose` force run old image

I was running the very good Linux Server IO Unifi Controller Docker image on my Raspberry Pi 3.
Unfortunately, this image no longer supports ARM32 since 2022-06-01.
I didn't realise this when I ran docker-compose pull to update to the latest image and now my controller won't work with the error message:
unifi-controller | ********************************************************
unifi-controller | ********************************************************
unifi-controller | * *
unifi-controller | * !!!! *
unifi-controller | * This Unifi-Controller image does not support *
unifi-controller | * 32 bit ARM due to a lack of OS packages *
unifi-controller | * *
unifi-controller | * *
unifi-controller | ********************************************************
unifi-controller | ********************************************************
Is there any way to pin docker-compose back to the pre-deprecation version?
When I run docker image ls, I still have the following images available on my system:
REPOSITORY TAG IMAGE ID CREATED SIZE
lscr.io/linuxserver/unifi-controller latest deeabba24529 10 days ago 102MB
lscr.io/linuxserver/unifi-controller <none> 048ec856c236 9 months ago 524MB
lscr.io/linuxserver/unifi-controller <none> 4858fc11dcf2 10 months ago 520MB
Or I could adjust the version in docker-compose.yml to select an old version perhaps.
I understand the risks of running old software but the newer 64 bit Raspberry Pi 4s are out of stock in my country so immediate ability to upgrade hardware is limited and I need access to my network configuration.
Just set the image: configuration for the relevant containers in your docker-compose.yaml to a specific version, e.g:
image: lscr.io/linuxserver/unifi-controller:latest
Use something like:
image: lscr.io/linuxserver/unifi-controller:arm32v7-7.3.76
Or whichever version is appropriate. Using the latest tag is often considered an anti-pattern for exacly this reason -- upgrades to a new major version can break your application stack. In most cases it's better to pin your docker-compose.yml to a specific version.
Most image repositories have a browseable interface for discovering available tags. I'm not familiar with the lscr.io repository, but if there's not a convenient web interface you can use skopeo:
skopeo list-tags docker://lscr.io/linuxserver/unifi-controller

How do I troubleshoot Azure Linux App Service when logs are not including error messages

I have deployed an Azure Linux App Service that runs a NodeJS app with LoopBack 4 framework for quite some time now, but since last week or so, I'm having troubles deploying a new version of the application using Azure DevOps.
Last week when an app was deployed, it took several restarts for the app to be started. Since yesterday it took around 7 hours before the application was available and today (at the time of writing) it's taking 3 hours.
This is currently only in my development environment (prod will be deployed only on pull request), but I think that the same will happen to my production environment when I deploy a new version for that. Unfortunately, I can't try this at this time.
When I open the log stream, I don't see any errors beside that it Waiting for response to warmup request for container.
I don't know if it can have something to do with a timeout for starting the application, since I'm getting that error message in the "Diagnose & solve problems" screen, but when I run my application om my development machine, it boots in less than 5 seconds.
I tried settings the WEBSITES_CONTAINER_START_TIME_LIMIT setting to 1800 via:
portal > app service > Configuration > Application settings
DevOps job for settings app settings
But this doesn't have the desired result as I'm still seeing the Waiting for response to warup request for container message.
In the "Diagnose & Solve problems" screen in the Azure portal, I also have an error for container crash. I would expect that I would see some kind of error as for why it fails, but all I see is the following output:
Container qusito-core-dev_0_fdc9a431 couldn't be started: Logs = 2020-09-11T13:43:59.410899652Z _____
2020-09-11T13:43:59.410934153Z / _ \ __________ _________ ____
2020-09-11T13:43:59.410938953Z / /_\ \___ / | \_ __ \_/ __ \
2020-09-11T13:43:59.410942753Z / | \/ /| | /| | \/\ ___/
2020-09-11T13:43:59.410946453Z \____|__ /_____ \____/ |__| \___ >
2020-09-11T13:43:59.410950153Z \/ \/ \/
2020-09-11T13:43:59.410953753Z A P P S E R V I C E O N L I N U X
2020-09-11T13:43:59.410957153Z
2020-09-11T13:43:59.410960353Z Documentation: http://aka.ms/webapp-linux
2020-09-11T13:43:59.410963553Z NodeJS quickstart: https://aka.ms/node-qs
2020-09-11T13:43:59.410966853Z NodeJS Version : v10.14.2
2020-09-11T13:43:59.410970153Z Note: Any data outside '/home' is not persisted
2020-09-11T13:43:59.410973453Z
2020-09-11T13:44:00.211504840Z Cound not find build manifest file at '/home/site/wwwroot/oryx-manifest.toml'
2020-09-11T13:44:00.211887043Z Could not find operation ID in manifest. Generating an operation id...
2020-09-11T13:44:00.211897743Z Build Operation ID: 7cf16daf-ccef-4ad1-b496-50778dafc913
2020-09-11T13:44:02.114481171Z Writing output script to '/opt/startup/startup.sh'
2020-09-11T13:44:02.490713708Z Running #!/bin/sh
2020-09-11T13:44:02.491320612Z
2020-09-11T13:44:02.491332412Z # Enter the source directory to make sure the script runs where the user expects
2020-09-11T13:44:02.491336512Z cd "/home/site/wwwroot"
2020-09-11T13:44:02.491340212Z
2020-09-11T13:44:02.491344012Z export NODE_PATH=$(npm root --quiet -g):$NODE_PATH
2020-09-11T13:44:02.491347512Z if [ -z "$PORT" ]; then
2020-09-11T13:44:02.493136724Z export PORT=8080
2020-09-11T13:44:02.493148924Z fi
2020-09-11T13:44:02.493152724Z
2020-09-11T13:44:02.493341325Z PATH="$PATH:/home/site/wwwroot" npm run start
2020-09-11T13:44:06.600726640Z npm info it worked if it ends with ok
2020-09-11T13:44:06.600776740Z npm info using npm#6.9.0
2020-09-11T13:44:06.600875440Z npm info using node#v10.14.2
2020-09-11T13:44:06.868226973Z npm info lifecycle core#1.0.0~prestart: core#1.0.0
2020-09-11T13:44:06.885230583Z
2020-09-11T13:44:06.885248883Z > core#1.0.0 prestart /home/site/wwwroot
2020-09-11T13:44:06.885253583Z > npm run build
2020-09-11T13:44:06.885264583Z
2020-09-11T13:44:07.662776822Z npm info it worked if it ends with ok
2020-09-11T13:44:07.663855429Z npm info using npm#6.9.0
2020-09-11T13:44:07.664656534Z npm info using node#v10.14.2
2020-09-11T13:44:07.823510864Z npm info lifecycle core#1.0.0~prebuild: core#1.0.0
2020-09-11T13:44:07.824905373Z npm info lifecycle core#1.0.0~build: core#1.0.0
2020-09-11T13:44:07.835657242Z
2020-09-11T13:44:07.835673743Z > core#1.0.0 build /home/site/wwwroot
2020-09-11T13:44:07.835678743Z > lb-tsc
2020-09-11T13:44:07.835682343Z
At this time I'm stuck with this issue. I saw a reference somewhere on SO that logging in Linux is not completely supported in the log stream (in Azure portal), but I don't know where to find any other logs.
Any help on how I should move forward is appreciated.
You can manage your App Service through Console or Kudu in Advanced Tools.
As it turn out, I had a typo in my settings: WEBSITE_CONTAINER_START_TIME_LIMIT (Forgot the “S” after WEBSITE).
After chaning this, I didn’t had any problem on deploying. I also noged that the lb-tsc command takes around 5-6 minutes when running. This is what probably went wrong when deploying.

gcloud app deploy does not remove previous versions

I am running a Node.js app on Google App Engine, using the following command to deploy my code:
gcloud app deploy --stop-previous-version
My desired behavior is for all instances running previous versions to be terminated, but they always seem to stick around. Is there something I'm missing?
I realize they are not receiving traffic, but I am still paying for them and they cause some background telemetry noise. Is there a better way of running this command?
Example output of the gcloud app instances list:
As you can see I have two different versions running.
We accidentally blew through our free Google App Engine credit in less than 30 days because of an errant flexible instance that wasn't cleared by subsequent deployments. When we pinpointed it as the cause it had scaled up to four simultaneous instances that were basically idling away.
tl;dr: Use the --version flag when deploying to specify a version name. An existing instance with the same version will be
replaced then next time you deploy.
That led me down the rabbit hole that is --stop-previous-version. Here's what I've found out so far:
--stop-previous-version doesn't seem to be supported anymore. It's mentioned under Flags on the gcloud app deploy reference page, but if you look at the top of the page where all the flags are listed, it's nowhere to be found.
I tried deploying with that flag set to see what would happen but it seemingly had no effect. A new version was still created, and I still had to go in and manually delete the old instance.
There's an open Github issue on the gcloud-maven-plugin repo that specifically calls this out as an issue with that plugin but the issue has been seemingly ignored.
At this point our best bet at this point is to add --version=staging or whatever to gcloud deploy app. The reference docs for that flag seem to indicate that that it'll replace an existing instance that shares that "version":
--version=VERSION, -v VERSION
The version of the app that will be created or replaced by this deployment. If you do not specify a version, one will be generated for you.
(emphasis mine)
Additionally, Google's own reference documentation on app.yaml (the link's for the Python docs but it's still relevant) specifically calls out the --version flag as the "preferred" way to specify a version when deploying:
The recommended approach is to remove the version element from your app.yaml file and instead, use a command-line flag to specify your version ID
As far as I can tell, for Standard Environment with automatic scaling at least, it is normal for old versions to remain "serving", though they should hopefully have zero instances (even if your scaling configuration specifies a nonzero minimum). At least that's what I've seen. I think (I hope) that those old "serving" instances won't result in any charges, since billing is per instance.
I know most of the above answers are for Flexible Environment, but I thought I'd include this here for people who are wondering.
(And it would be great if someone from Google could confirm.)
I had same problem as OP. Using the flex environment (some of this also applies to standard environment) with Docker (runtime: custom in app.yaml) I've finally solved this! I tried a lot of things and I'm not sure which one fixed it (or whether it was a combination) so I'll list the things I did here, the most likely solutions being listed first.
SOLUTION 1) Ensure that cloud storage deletes old versions
What does cloud storage have to do with anything? (I hear you ask)
Well there's a little tooltip (Google Cloud Platform Web UI (GCP) > App Engine > Versions > Size) that when you hover over it says:
(Google App Engine) Flexible environment code is stored and billed from Google Cloud Storage ... yada yada yada
So based on this info and this answer I visited GCP > Cloud Storage > Browser and found my storage bucket AND a load of other storage buckets I didn't know existed. It turns out that some of the buckets store cached cloud functions code, some store cached docker images and some store other cached code/stuff (you can tell which is which by browsing the buckets).
So I added a deletion policy to all the buckets (except the cloud functions bucket) as follows:
Go to GCP > Cloud Storage > Browser and click the link (for the relevant bucket) in the Lifecycle Rules column > Click ADD A RULE > THEN:
For SELECT ACTION choose "Delete Object" and click continue
For SELECT OBJECT choose "Number of newer versions" and enter 1 in the input
Click CREATE
This will return you to the table view and you should now see the rule in the lifecycle rules column.
REPEAT this process for all relevant buckets (the relevant buckets were described earlier).
THEN delete the contents of the relevant buckets. WARNING: Some buckets warn you NOT to delete the bucket itself, only the contents!
Now re-deploy and your latest version should now get deployed and hopefully you will never have this problem again!
SOLUTION 2) Use deploy flags
I added these flags
gcloud app deploy --quiet --promote --stop-previous-version
This probably doesn't help since these flags seem to be the default but worth adding just in case.
Note that for the standard environment only (I heard on the grapevine) you can also use the --no-cache flag which might help but with flex, this flag caused the deployment to fail (when I tried).
SOLUTION 3)
This probably does not help at all, but I added:
COPY app.yaml .
to the Dockerfile
TIP 1)
This is probably more of a helpful / useful debug approach than a fix.
Visit GCP > App Engine > Versions
This shows all versions of your app (1 per deployment) and it also shows which version each instance is running (instances are configured in app.yaml).
Make sure all instances are running the latest version. This should happen by default. Probably worth deleting old versions.
You can determine your version from the gcloud app deploy logs (at the start of the logs) but it seems that the versions are listed by order of deployment anyway (most recent at top).
TIP 2)
Visit GCP > App Engine > Instances
SSH into an instance. This is just a matter of clicking a few buttons (see screenshot below). Once you have SSH'd in run:
docker exec -it gaeapp /bin/bash
Which will get you into the docker container running your code. Now you can browse around to make sure it has your latest code.
Well I think my answer is long enough now. If this helps, don't thank me, J-ES-US is the one you should thank ;) I belong to Him ^^
Google may have updated their documentation cited in #IAmKale's answer
Note that if the version is running on an instance of an auto-scaled service, using --stop-previous-version will not work and the previous version will continue to run because auto-scaled service instances are always running.
Seems like that flag only works with manually scaled services.
This is a supplementary and optional answer in addition to my other main answer.
I am now, in addition to my other answer, auto incrementing version manually on deploy using a script.
My script contents are below.
Basically, the script auto increments version every time you deploy. I am using node.js so the script uses npm version to bump the version but this line could easily be tweaked to whatever language you use.
The script requires a clean git working directory for deployment.
The script assumes that when the version is bumped, this will result in file changes (e.g. changes to package.json version) that need pushing.
The script essentially tries to find your SSH key and if it finds it then it starts an SSH agent and uses your SSH key to git commit and git push the file changes. Else it just does a git commit without a push.
It then does a deploy using the --version flag ... --version="${deployVer}"
Thought this might help someone, especially since the top answer talks a lot about using the --version flag on a deploy.
#!/usr/bin/env bash
projectName="vehicle-damage-inspector-app-engine"
# Find SSH key
sshFile1=~/.ssh/id_ed25519
sshFile2=~/Desktop/.ssh/id_ed25519
sshFile3=~/.ssh/id_rsa
sshFile4=~/Desktop/.ssh/id_rsa
if [ -f "${sshFile1}" ]; then
sshFile="${sshFile1}"
elif [ -f "${sshFile2}" ]; then
sshFile="${sshFile2}"
elif [ -f "${sshFile3}" ]; then
sshFile="${sshFile3}"
elif [ -f "${sshFile4}" ]; then
sshFile="${sshFile4}"
fi
# If SSH key found then fire up SSH agent
if [ -n "${sshFile}" ]; then
pub=$(cat "${sshFile}.pub")
for i in ${pub}; do email="${i}"; done
name="Auto Deploy ${projectName}"
git config --global user.email "${email}"
git config --global user.name "${name}"
echo "Git SSH key = ${sshFile}"
echo "Git email = ${email}"
echo "Git name = ${name}"
eval "$(ssh-agent -s)"
ssh-add "${sshFile}" &>/dev/null
sshKeyAdded=true
fi
# Bump version and git commit (and git push if SSH key added) and deploy
if [ -z "$(git status --porcelain)" ]; then
echo "Working directory clean"
echo "Bumping patch version"
ver=$(npm version patch --no-git-tag-version)
git add -A
git commit -m "${projectName} version ${ver}"
if [ -n "${sshKeyAdded}" ]; then
echo ">>>>> Bumped patch version to ${ver} with git commit and git push"
git push
else
echo ">>>>> Bumped patch version to ${ver} with git commit only, please git push manually"
fi
deployVer="${ver//"."/"-"}"
gcloud app deploy --quiet --promote --stop-previous-version --version="${deployVer}"
else
echo "Working directory unclean, please commit changes"
fi
For node.js users if you call the script deploy.sh you should add:
"deploy": "sh deploy.sh"
In your package.json scripts and deploy with npm run deploy

How do you get more details about errors when running `heroku local`

When I run heroku local to test my worker process, I get this:
forego | starting worker.1 on port 5000
worker.1 | /myproject/source-file.js:28 # in red
worker.1 | ad (module.js:343:32) # in red
The last two lines are in red, indicating an error. But the stack trace nor the error message are visible.
It's important that I do heroku local instead of node source-file.js is because I have a local environment file called .env whose keys are needed for the local process to consume and I'm not sure how to feed it into node.
How can I expand on those red messages to find out what failed?
I found the simple way to do this is to use foreman start instead. It actually is mentioned in the Heroku Local guide. .env get read in the same way heroku local does it.

libpthread version mismatch on HP_UX

I have multithreaded application, it is working fine at my development server (HP-UX). When it is deployed to client server it gives the following error:
. Process 16448. Starting (CONT) Thread: 0 /usr/lib/pa20_64/dld.sl:
Unsatisfied code ymbol 'pthread_create' in load module 'bin/CCQO'.
Killed
I fond that libpthread.1 at customer server does not have pthread_create method with nm command. This from client server:
/usr/lib/pa20_64 > nm -g libpthread.1 | grep 'pthread_cre'
[475] | 4611686018427436512| 1116|FUNC |GLOB |0|.text|__pthread_create_system
But when I run the same command on my development server I get following output:
[733] | 4611686018427467256| 2160|FUNC |GLOB |0| .text|__pthread_create_generic
[712] | 4611686018427467192| 64|FUNC |GLOB |0| .text|__pthread_create_system
[625] | 4611686018427467112| 64|FUNC |WEAK |0| .text|pthread_create
I tried to copy libraries from my server to client server but it does not work.
Please me know how can I now the version of threading library at my machine and at client machine ?
How can I update client machine with updated library ?
Can copying my libraries to other server, solve the issue ? If yes, then what are steps.
I fond that libpthread.1 at customer server does not have pthread_create method with nm command.
Your customer's libpthread.1 is corrupt. Perhaps the customer ran strip, or molested it in some other way.
Can copying my libraries to other server, solve the issue ?
This can render the machine un-bootable, and you should almost certainly not do that.
The right solution is for the customer's sysadmin to restore system libraries from his HP-UX media.

Resources