In Prometheus Pushgateway getting metrics of stopped MIG instance also along with running instance - cron

as MIG instances in GCP keep getting scaled up and scaled down and instance gets new name every-time it scaled up and also MIG don’t have static IP. So I am trying Prometheus Pushgateway to get metrics. But after setting up everything I am getting metrics of scaled down instances also along with scaled up instance. And scaled down instance metric is in form of straight line. Now I don't want metrics of scaled down instance and also if there is any way that can automatically remove stop instances metrics only but don’t disturb running instance.
I have created .sh script which can check stopped instances and don't send metrics but it is not working. i have created crontab to check script but it is also not working.
#!/bin/bash
Get the list of active MIG instances
active_instances=$(gcloud compute instances list --filter="status=RUNNING" --format="value(name)")
PUSHGATEWAY_SERVER=http://Address (which I have mentioned in actual script)
Get the list of existing metrics from the push gateway
existing_metrics=$(curl $PUSHGATEWAY_SERVER/metrics 2>/dev/null | grep "node-exporter" | awk '{print $NF}')
Loop through each active instance and scrape its metrics using Prometheus
for instance in $active_instances; do
NODE_NAME=$instance
Check if the instance is part of a MIG and if it is still running
if gcloud compute instances describe $NODE_NAME --format="value(mig,status)" | grep -q "mig"; then
instance_status=$(gcloud compute instances describe $NODE_NAME --format="value(status)")
if [ "$instance_status" == "RUNNING" ]; then
# Collect and push the metrics
curl -s localhost:9100/metrics | curl --data-binary #- $PUSHGATEWAY_SERVER/metrics/job/node-exporter/instance/$NODE_NAME
else
# If the instance is not running, delete its metrics from the push gateway
metric_to_delete=$(echo $existing_metrics | grep "$NODE_NAME")
if [ -n "$metric_to_delete" ]; then
curl -X DELETE $PUSHGATEWAY_SERVER/metrics/job/node-exporter/instance/$NODE_NAME
fi
fi
fi
done

Related

High process count on cPanel hosted node.js app, and increasing

I have a single node.js app running on my shared web host. The cPanel shows 67/100 processes, and 7 entry processes.
The thing is, the site currently doesn't do anything except letting users see it.
The number of processes when I first deployed the app a week ago was only 11/100. But it keeps rising gradually, for no apparent reason..
I was wondering if my code has any issue to be causing this.. It is fairly simple, but there may be something I do not see.
My entire project is hosted on github at https://github.com/ravindukrs/HackX-Jr-Web
===================
What I tried
I Stopped the app from cPanel. But number of processes didn't go down. It slightly reduced the CPU Usage though.
Note
CPU Usage remains 0/100 even when the app is running.
I am not a great developer, so code may not be optimized. But was just wondering if I am creating any processes that do not end..
The site is currently hosted at https://hackxjr.lk
Thank you in advance.
Update: Count is still going up
Here is my experience with this same problem and how I resolved it.
I setup a simple NodeJs app on my Namecheap host and about a day later my whole domain was unavailable. I checked CPANEL and noticed 200/200 processes were running. So I contacted Support and they said this:
As I have checked, there are a lot of stuck NodeJS processes, I will take necessary actions from my end and set up a cron job for you that will remove the stuck processes automatically, so you won't face such issue again. Please give me 7-10 minutes.
Here is the cron job they setup:
ps faux | grep lsnode | grep -v 'grep' > $HOME/tmp_data; IFS=$'\n'; day=$(date +%d); for i in $(cat $HOME/tmp_data); do for b in $i; do echo $i | awk -F[^0-9]* '{print $2" "$9}' | awk -v day1=$(date +%d) '{if($2+2<day1)print$1}' | xargs kill -9 && echo "NodeJS process killed"; done; done >/dev/null 2>&1
I have not had an issue since.
I also had the problem on Namecheap. Strange that it is always them…
Support told me it had to do with their CageFS and that it only can be fixed/reseted via support.
Edit:
support gave me a new cronjob to run
kill -9 $(ps faux | grep node | grep -v grep | awk {'print $2'})
For me, this one is working better than the command from Gerardo.
You can stop unused processes by run this command:
/usr/bin/pkill -9 lsnode
I have encountered the exact same issue like you are describing. My hosting provider for NodeJS apps and PHP sites is Namecheap too. Strange that their name keeps popping up on this thread.
This is what Namecheap support said:
According to our check, the issue was caused by the numerous stuck processes generated by the Node.js app. We have removed them and the websites are now up. In case the issue reappears, we would recommend contacting a web developer to optimize your app and/or setting up the following cron job to kill the processes:
/usr/bin/pkill -9 lsnode >/dev/null 2>&1
If you are using cPanel, this article might help you to setup a cron job: https://www.namecheap.com/support/knowledgebase/article.aspx/9453/29/how-to-run-scripts-via-cron-jobs/

Too many open files in system in kubernetes cluster [duplicate]

I am trying create a bunch of pods, services and deployment using Kubernetes, but keep hitting the following errors when I run the kubectl describe command.
for "POD" with RunContainerError: "runContainer: API error (500): Cannot start container bbdb58770a848733bf7130b1b230d809fcec3062b2b16748c5e4a8b12cc0533a: [8] System error: too many open files in system\n"
I have already terminated all pods and try restarting the machine, but it doesn't solve the issue. I am not an Linux expert, so I am just wondering how shall find all the open files and close them?
You can confirm which process is hogging file descriptors by running:
lsof | awk '{print $2}' | sort | uniq -c | sort -n
That will give you a sorted list of open FD counts with the pid of the process. Then you can look up each process w/
ps -p <pid>
If the main hogs are docker/kubernetes, then I would recommend following along on the issue that caesarxuchao referenced.

EC2-Describe-Instances for DNS

I found a script online a few months ago that I changed and came up with the solution below.
It uses EC2-Describe-Instances and uses Perl to collect the Instance Names, IP address and updates Route53.
It works but its a bit in-efficient, I'm more of a .Net programmer and I am a little out of my depth so hopefully someone can help or point me in the right direction.
What I am thinking is I want it to save a copy of EC2-Describe-Instances from last time it ran and then get a fresh copy. Compare the differences and then only run the Route53 update for Instances that have changed IP. Any Ideas?
#!/bin/bash
set root='dirname $0'
ec2-describe-instances -O ###### -W ##### --region eu-west-1 |
perl -ne '/^INSTANCE\s+(i-\S+).*?(\S+\.amazonaws\.com)/
and do { $dns = $2; print "$1 $dns\n" }; /^TAG.+\sName\s+(\S+)/
and print "$1 $dns\n"' |
perl -ane 'print "$F[0] CNAME $F[1] --replace\n"' |
grep -v '^i-' |
xargs --verbose -n 4 -I myvar /bin/sh -c '{ /usr/local/bin/cli53 rrcreate -x 300 contoso.com 'myvar'; sleep 1; printf "\n\n"; }'
--edit--
Basically what I need is a way to compare a saved file with the output of EC2-Describe-Instances and then only return lines that contain differences to be fed back into the rest of the code.
Something like:
ChangedLines(File.txt, "ec2-describe-instances -O ###### -W ##### --region eu-west-1") | perl......
If
File 1 =
ABC
DEF
GHI
JKL
Output =
ABC
DEF
GHJ
JKL
Return =
GHJ
Example of EC2-Descibe-Instances
PROMPT> ec2-describe-instances
RESERVATION r-1a2b3c4d 111122223333 my-security-group
INSTANCE i-1a2b3c4d ami-1a2b3c4d ec2-203-0-113-25.compute-1.amazonaws.com ip-10-251-50-12.ec2.internal running my-key-pair 0 t1.micro YYYY-MM-DDTHH:MM:SS+0000 us-west-2a aki-1a2b3c4d monitoring-disabled 184.73.10.99 10.254.170.223 ebs paravirtual xen ABCDE1234567890123 sg-1a2b3c4d default false
BLOCKDEVICE /dev/sda1 vol-1a2b3c4d YYYY-MM-DDTHH:MM:SS.SSSZ true
RESERVATION r-2a2b3c4d 111122223333 another-security-group
INSTANCE i-2a2b3c4d ami-2a2b3c4d ec2-203-0-113-25.compute-1.amazonaws.com ip-10-251-50-12.ec2.internal running my-key-pair 0 t1.micro YYYY-MM-DDTHH:MM:SS+0000 us-west-2c windows monitoring-disabled 50.112.203.9 10.244.168.218 ebs hvm xen ABCDE1234567890123 sg-2a2b3c4d default false
BLOCKDEVICE /dev/sda1 vol-2a2b3c4d YYYY-MM-DDTHH:MM:SS.SSSZ true
I need to capture the lines in which the IP address has changed from the previous run.
It sounds like your actual goal is to update Amazon Route 53 for newly launch Amazon EC2 instances. There's a few different approaches you could take.
List instances launched during a given period
Use the AWS Command-Line Interface (CLI) to list instances that were recently launched. I found this example on https://github.com/aws/aws-cli/issues/1209:
aws ec2 describe-instances --query 'Reservations[].Instances[?LaunchTime>=`2015-03-01`][].{id: InstanceId, type: InstanceType, launched: LaunchTime}'
Modified for your needs:
aws ec2 describe-instances --query 'Reservations[].Instances[?LaunchTime>=`2015-03-01`][].{id: InstanceId, ip: PrivateIpAddress}' --output text
Let the instance update itself
Thinking a different way, why not have the instances update Amazon Route 53 themselves? Use a start script (via User Data) that calls the AWS CLI to update Route 53 directly!
Instances can retrieve their IP address via instance metadata:
curl http://169.254.169.254/latest/meta-data/public-ipv4/public/
Then call aws route53 change-resource-record-sets to update records.

Redis connection based on latency

I'm writing a nodejs webserver that will exist in multiple regions across the world.
When using redis on node is it possible to provide a list of all my redis servers and the client can connect to the one that's closest based on latency?
Can we assume that you're using Redis 3+?
If so, CLUSTER NODES is your friend.
For a horrible bash example, on a Redis server with Puppet's facter installed,
INITMASTER1ID=$(redis-cli -h $(facter ipaddress) -c CLUSTER NODES | grep $(facter ipaddress) | grep -Eo '^[^ ]+')
NODEIDLIST=$(redis-cli -h $REDIS2 -c CLUSTER NODES | grep -Eo '^[^ ]+')
http://redis.io/commands/cluster-nodes
For some less horrible examples written in Node.JS, check the docs for https://www.npmjs.com/package/redis-cluster.

Google cloud SDK code to execute via cron

I am trying to implement an automated code to shut down and start VM Instances in my Google Cloud account via Crontab. The OS is Ubuntu 12 lts and is installed with Google service account so it can handle read/write on my Google cloud account.
My actual code is in this file /home/ubu12lts/cronfiles/resetvm.sh
#!/bin/bash
echo Y | gcloud compute instances stop my-vm-name --zone us-central1-a
sleep 120s
gcloud compute instances start my-vm-name --zone us-central1-a
echo "completed"
When I call the above file like this,
$ bash /home/ubu12lts/cronfiles/resetvm.sh
It works perfect and does the job.
Now I wanted to set this up in cron so it would do automatically every hour. So I did
$ sudo crontab -e
And added this code in cron
0 * * * * /bin/sh /home/ubu12lts/cronfiles/resetvm.sh >>/home/ubu12lts/cron.log
And made script executable
chmod +x /home/ubu12lts/cronfiles/resetvm.sh
I also tested the crontab by adding a sample command of creating .txt file with a sample message and it worked perfect.
But the above code for gcloud SDK doesn't work through cron. The VM doesn't stop neither starts in my GC compute engine.
Anyone can help please?
Thank you so much.
You have added the entry to root's crontab, while your Cloud SDK installation is setup for a different user (I am guessing ubu121lts).
You should add the entry in ubu121lts's crontab using:
crontab -u ubu121lts -e
Additionally your entry is currently scheduled to run on the 0th minute every hour. Is that what you intended?
I have run into a similar issue before. I fixed it by forcing the profile in my script.sh,loading the gcloud environment variables with it. Example below:
#!/bin/bash
source /etc/profile
echo Y | gcloud compute instances stop my-vm-name --zone us-central1-a
sleep 120s gcloud compute instances start my-vm-name --zone
us-central1-a echo "completed"
This also helped me resize node count in GKE.

Resources