Linux Split command very slow on AWS Instance - linux

I have deployed my application in AWS Instance and my application is using some linux system commands which is called via simple shell script.
Below is the sample script content:
#!/bin/bash
echo "File Split started"
cd /prod/data/java/
split -a 5 -l 100000000 samplefile.dat
echo "File Split Completed"
Actually the same script was running faster in our local server. In Local Server it was taking 15 minutes to complete , but in AWS Instance it took 45 minutes. This is huge difference.
Update: Also in AWS its not using more CPU, hardly 2 to 5 percentage, thats why its very slow.
Both are 64 bit OS and Local server is RHEL 5(2.6.18) , AWS is RHEL 6(2.6.xx)
Can anyone help on this?
Regards,
Shankar

Related

Bash script results in different output when running from a cron job

I'm puzzled by this problem I'm having on Ubuntu 20.04 where cron is able to run a bash script but the overall outcome is different then when using the shell command.
I've look through all questions I could in here and on Google but couldn't find anyone that had the same problem.
Background:
I'm using Pushgateway to store metrics I'm generating through a bash script, and afterwards it's being imported automatically to Prometheus.
The end goal is to export a list of running processes, their CPU%, Mem% etc, similar to top command.
This is the bash script:
#!/bin/bash
z=$(top -n 1 -bi)
while read -r z
do
var=$var$(awk 'FNR>7{print "cpu_usage{process=\""$12"\", pid=\""$1"\"}", $9z} FNR>7{print "memory_usage{process=\""$12"\", pid=\""$1"\"}", $10z}')
done <<< "$z"
curl -X POST -H "Content-Type: text/plain" --data "$var
" http://localhost:9091/metrics/job/top/instance/machine
I used to have a version that used ps aux but then I found out that it only shows the average CPU% per process.
As you can see, the command I'm running is top -n 1 -bi which gives me a snapshot of active processes and their metrcis.
I'm using awk to format the data, and FNR>7 because I need to ignore the first 7 lines which is the summery presented by top.
The bash scrip is registered on /bin, /usr/bin and /usr/local/bin.
When checking http://localhost:9091/metrics, which is supposed to show me the information gathered, I'm getting this some of information when running the scrip using shell:
cpu_usage{instance="machine",job="top",pid="114468",process="php-fpm74"} 17.6
cpu_usage{instance="machine",job="top",pid="114483",process="php-fpm74"} 11.8
cpu_usage{instance="machine",job="top",pid="126305",process="ffmpeg"} 64.7
And this is the same information when cron is running the same script:
cpu_usage{instance="machine",job="top",pid="114483",process="php-fpm+"} 5
cpu_usage{instance="machine",job="top",pid="126305",process="ffmpeg"} 60
cpu_usage{instance="machine",job="top",pid="128777",process="php"} 15
So, for some reason, when I run it from cron it cuts the process name after 7 places.
I initially though it was related to the FNR>7 but even after changing it to 8 or 9 (and using exec bash to re-register the command) it gives the same results, also when I run it manually it works just fine.
Any help would be appreciated!!

Half of Crontab script works, the other half doesn't

I'm having some difficulty with my Crotab. I've run through this list: https://askubuntu.com/questions/23009/reasons-why-crontab-does-not-work Which was helpful. I have the following crontab
35 02 * * * /root/scripts/backup
Which runs:
#!/bin/bash
PATH=/home:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
echo "Time: $(date)" >> /home/me/SystemSync.log
rsync -ar -e ssh /home/ root#IPADDRESSOFNAS:/volume1/backup/server/SystemSSD/
echo "Time: $(date)" >> /SecondHD/RsyncDebug.log
sudo rsync -arv -e ssh /SecondHD/ root#IPADDRESSOFNAS:/volume1/backup/server/SecondHD/
The .log files are written to the server, with the intention that they will be synced with the NAS if it is successfully synced.
I know the following:
The SystemSSD portion of the script works as do both of the log portions of the script. On the NAS, I see the Systemsync.log file with the newest entry. I find the RsyncDebug.log on the server with the updated time stamp, but not on the on the NAS.
The script runs in its entirety when I run it from the command line, just not in crontab.
Potentially pertinent information:
I'm running CentOS 6 on the server.
The system drive is a 1TB SSD and the Second drive is a 4 TB Raid1 hard drive with ca. 1 TB of space remaining.
The NAS volume has 5TB drives, with about 1 TB of space remaining.
Thanks in advance. Someday I hope to teach as much as I learn from this community.

Google cloud SDK code to execute via cron

I am trying to implement an automated code to shut down and start VM Instances in my Google Cloud account via Crontab. The OS is Ubuntu 12 lts and is installed with Google service account so it can handle read/write on my Google cloud account.
My actual code is in this file /home/ubu12lts/cronfiles/resetvm.sh
#!/bin/bash
echo Y | gcloud compute instances stop my-vm-name --zone us-central1-a
sleep 120s
gcloud compute instances start my-vm-name --zone us-central1-a
echo "completed"
When I call the above file like this,
$ bash /home/ubu12lts/cronfiles/resetvm.sh
It works perfect and does the job.
Now I wanted to set this up in cron so it would do automatically every hour. So I did
$ sudo crontab -e
And added this code in cron
0 * * * * /bin/sh /home/ubu12lts/cronfiles/resetvm.sh >>/home/ubu12lts/cron.log
And made script executable
chmod +x /home/ubu12lts/cronfiles/resetvm.sh
I also tested the crontab by adding a sample command of creating .txt file with a sample message and it worked perfect.
But the above code for gcloud SDK doesn't work through cron. The VM doesn't stop neither starts in my GC compute engine.
Anyone can help please?
Thank you so much.
You have added the entry to root's crontab, while your Cloud SDK installation is setup for a different user (I am guessing ubu121lts).
You should add the entry in ubu121lts's crontab using:
crontab -u ubu121lts -e
Additionally your entry is currently scheduled to run on the 0th minute every hour. Is that what you intended?
I have run into a similar issue before. I fixed it by forcing the profile in my script.sh,loading the gcloud environment variables with it. Example below:
#!/bin/bash
source /etc/profile
echo Y | gcloud compute instances stop my-vm-name --zone us-central1-a
sleep 120s gcloud compute instances start my-vm-name --zone
us-central1-a echo "completed"
This also helped me resize node count in GKE.

How to trigger a certain shell script on multiple Linux machines simultaneously?

I have a batch script which has to trigger a certain shell script on some 10 linux machines through plink[putty].
But when I trigger the shell script, the problem is that the control goes to shell script.It runs for some 10 hours and then returns the control to batch. Now my batch proceeds for the 2nd linux machine and wait for 10 hours and so on...
My requirement is to trigger the shell script on all the linux machines simultaneously.
It can be something like trigger the shell script return the ctrl to batch trigger on other machine also ok.
If you're using bash as your shell, you should include a tag for it. (Note the relatively low number of 'followers' for your tags.)
Assuming a modern Linux/Unix shell, the answer to you question is to append the & character to the line that makes the connection to the remote machine. & means 'run this process in the background'.
So something like
#!/bin/bash
# master script
ssh host1 "/path/to/remote/script/runMe" &
ssh host2 "/path/to/remote/script/runMe" &
ssh host3 "/path/to/remote/script/runMe" &
ssh host4 "/path/to/remote/script/runMe" &
Sounds like what you are looking for.
Capturing log information and monitoring the status of the remote "runMe"s is in the realm of consultation engagments or at least 100 hrs of experimentation on your part. Good luck.
IHTH
With GNU Parallel you can do:
parallel --nonall -S host1,host2,host3,host4 /path/to/remote/script/runMe
GNU Parallel guarantees the output from stdout and stderr is captured and not mixed. With --tag each line will be prepended with the host:
parallel --tag --nonall -S host1,host2,host3,host4 /path/to/remote/script/runMe >output.stdout 2>output.stderr
To learn more watch the intro videos: https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
10 seconds installation:
wget -O - pi.dk/3 | bash

Unix : Run a script on a different machine at a particular time

I am working on a script which gets the script name and time to run that script and login /host name from a configuration file.
I dont have cron,at and crontab permission.
Now is there any other way to implement the logic to run a script on the input time (set in a configuratble file) from another script running on different host.
In Detail:
It is like script_A reads a configuration file from where it gets three inputs script_B , time to run (ddmmyyyy h24:mm:ss),login1#machine1. This script_B has to be run at a time provided on the given host.
None of the connected machines have cron,crontab,at permissions
I am using solaris
Can we have something like this in unix that the script_A creates a script_c which have the script_B with a check on time parameter. Now this script_c is copied to remote machine and it keeps running there in background till the time provided is reached.Once the time has come it execute script_b (located at remote host in the config file) and exit.
Thanks.
If you want to execute command foo at epoch time xxxxxxxxxx on host, you could do:
$ delay=$((xxxxxxxxxx - $( date +%s ))); test $delay -gt 0 &&
sleep $delay && ssh host foo
The simplest method is to compile cron from source and deploy it on the target machine. Every time your code gets any kind of control over the machine, check if your cron daemon is running (classic PID file) and start it if necessary.
A warning, though:
This is a social problem and a technical solution. Your mileage will be low.

Resources