Script should run by own during server startup - linux

We are using JON tool to monitor our infrastructure .we set up the threshold for RAM usage(60%,65% of total RAM)using tool GUI.
In case if server(which is in in cloud ) RAM size is increased we need to manually change the threshold level using GUI .To avoid that I wrote a shell script which uses JON CLI to update the threshold of RAM (based on current RAM size), script is working and no problem in that.
For example, initially if RAM size is 8 gb we set up the threshold (65% from 8gb) based on current size. Due to some need if they increase the size to 16 GB we need to set up the threshold(65% from 16GB) manually.To avoid that I created shell script which uses JON CLI. to update threshold value( during maintenance they shut down the servers and increase the RAM size as per their need.)
Problem:
If the server size is increased I need to run the script manually to set the threshold. Since they are bringing the server to down during size changes , the script need to run by own once they started the server. So I placed my script in /etc/rc.local file Recently the team has increased the RAM size and started the server but there is no change in threshold (which means script doesn't run by own). Thus i ran the script manually to update the threshold
Expectation:
Script should run by own during server start up.
Flavour:centos(6.5)
Even though it is basic thing please guide and help on this.

If I understand the problem correctly,
you script does not start from /etc/rc.local.
Please check, if /etc/rc.local is executed.
For that add something like:
touch /tmp/created-by-rc.local
to it and restart the server.
After that you will know, if /etc/rc.local is started
and depending on that you can go further this or that way.
Also, you can create an own start script for your script.
Check this article, where the procedure is described in details:
https://techarena51.com/index.php/how-to-create-an-init-script-on-centos-6/

Related

Is it unhealthy for SSD if I write 'vital signal' to check a python code is running?

A python program that I'm building was used to die for no apparent reason. I couldn't figure out the reason, so my workaround was to add few lines that write the time to a 'vitality' file every time a certain line within the program is executed, which happens about every 0.1 seconds.
A separate script reads the 'vitality' file every 1 second, and when the vital sign doesn't update for, say 10 seconds, the script kills the program and restarts it.
So far this workaround has been working great on the original problem, but now I'm rather concerned if the SSD will degrade by this or not.
Does writing 10 digits of unixtimestamp every 0.1s to a file have negligible effect on SSD health, or would it degrade the SSD fast?
Doing that will degrade the SSD and destroy it over time.
In my last job, the SSD health tool (smartctl) indicated that the 15 SSDs in our cluster product were wearing rapidly and had only months of life left. The team found that a third party software package (etcd) was syncing a small amounts of data to a filesystem on SSD once per second. And each sync wrote at least an entire 16K block. Luckily, the problem was found early enough that we could patch it in a software update before suffering too many customer returns.
Write the 'vitality' file somewhere else. It could be on a tmpfs like /var/run/user/. Or use a different vitality mechanism; something like supervisord can manage your task, run health checks and restart it on failure.

Distributed Processing of Volumetric Image Data

For the development of an object recognition algorithm, I need to repeatedly run a detection program on a large set of volumetric image files (MR scans).
The detection program is a command line tool. If I run it on my local computer on a single file and single-threaded it takes about 10 seconds. Processing results are written to a text file.
A typical run would be:
10000 images with 300 MB each = 3TB
10 seconds on a single core = 100000 seconds = about 27 hours
What can I do to get the results faster? I have access to a cluster of 20 servers with 24 (virtual) cores each (Xeon E5, 1TByte disks, CentOS Linux 7.2).
Theoretically the 480 cores should only need 3.5 minutes for the task.
I am considering to use Hadoop, but it's not designed for processing binary data and it splits input files, which is not an option.
I probably need some kind of distributed file system. I tested using NFS and the network becomes a serious bottleneck. Each server should only process his locally stored files.
The alternative might be to buy a single high-end workstation and forget about distributed processing.
I am not certain, if we need data locality,
i.e. each node holds part of the data on a local HD and processes only his
local data.
I regularly run large scale distributed calculations on AWS using Spot Instances. You should definitely use the cluster of 20 servers at your disposal.
You don't mention which OS your servers are using but if it's linux based, your best friend is bash. You're also lucky that it's a command line programme. This means you can use ssh to run commands directly on the servers from one master node.
The typical sequence of processing would be:
run a script on the Master Node which sends and runs scripts via ssh on all the Slave Nodes
Each Slave Node downloads a section of the files from the master node where they are stored (via NFS or scp)
Each Slave Node processes its files, saving required data via scp, mysql or text scrape
To get started, you'll need to have ssh access to all the Slaves from the Master. You can then scp files to each Slave, like the script. If you're running on a private network, you don't have to be too concerned about security, so just set ssh passwords to something simple.
In terms of CPU cores, if the command line program you're using isn't designed for multi-core, you can just run several ssh commands to each Slave. Best thing to do is run a few tests and see what the optimal number of process is, given that too many processes might be slow due to insufficient memory, disk access or similar. But say you find that 12 simultaneous processes gives the fastest average time, then run 12 scripts via ssh simultaneously.
It's not a small job to get it all done, however, you will forever be able to process in a fraction of the time.
You can use Hadoop. Yes, default implementation of FileInputFormat and RecordReader are splitting files into chunks and split chunks into lines, but you can write own implementation of FileInputFormat and RecordReader. I've created custom FileInputFormat for another purpose, I had opposite problem - to split input data more finely than default, but there is a good looking recipes for exactly your problem: https://gist.github.com/sritchie/808035 plus https://www.timofejew.com/hadoop-streaming-whole-files/
But from other side Hadoop is a heavy beast. It has significant overhead for mapper start, so optimal running time for mapper is a few minutes. Your tasks are too short. Maybe it is possible to create more clever FileInputFormat which can interpret bunch of files as single file and feed files as records to the same mapper, I'm not sure.

Limiting the memory usage of a program in Linux

I'm new to Linux and Terminal (or whatever kind of command prompt it uses), and I want to control the amount of RAM a process can use. I already looked for hours to find an easy-t-use guide. I have a few requirements for limiting it:
Multiple instances of the program will be running, but I only want to limit some of the instances.
I do not want the process to crash once it exceeds the limit. I want it to use HDD page swap.
The program will run under WINE, and is a .exe.
So can somebody please help with the command to limit the RAM usage on a process in Linux?
The fact that you’re using Wine makes no difference in this particular context, which leaves requirements 1 and 2. Requirement 2 –
I do not want the process to crash once it exceeds the limit. I want it to use HDD page swap.
– is known as limiting the resident set size or rss of the process, and it’s actually rather nontrivial to do on Linux, as is demonstrated by a question asked in 2010. You’ll need to set up Linux control groups (cgroups). Fortunately, Justin L.’s answer gives a brief rundown on how to do so. Note that
instead of jlebar, you should use your own Unix user name, and
instead of your/program, you should use wine /path/to/Windows/program.exe.
Using cgroups will also satisfy your other requirements – you can start as many instances of the program as you wish, but only those which you start with cgexec -g memory:limited will be limited.

Profiling very long running tasks

How can you profile a very long running script that is spawning lots of other processes?
We have a job that takes a long time to run - 11 or more hours, sometimes more than 17 - so it runs on a Amazon EC2 instance.
(It is doing cufflinks DNA alignment and stuff.)
The job is executing lots of processes, scripts and utilities and such.
How can we profile it and determine which component parts of the job take the longest time?
A simple CPU utilisation per process per second would probably suffice. How can we obtain it?
There are many solutions to your question :
munin is a great monitoring tool that can scan almost everything in your system and make nice graph about it :). It is very easy to install and use it.
atop could be a simple solution, it can scan cpu, memory, disk regulary and you can store all those informations into files (the -W option), then you'll have to anaylze those file to detect the bottleneck.
sar, that can scan more than everything on your system, but a little more hard to interpret (you'll have to make the graph yourself with RRDtool for example)

High %wa CPU load when running PHP as CLI

Sorry for the vague question, but I've just written some php code that executes itself as CLI, and I'm pretty sure it's misbehaving. When I run "top" on the command line it's showing very little resources given to any individual process, but between 40-98% to iowait time (%wa). I usually have about .7% distributed between %us and %sy, with the remaining resources going to idle processes (somewhere between 20-50% usually).
This server is executing MySQL queries in, easily, 300x the time it takes other servers to run the same query, and it even takes what seems like forever to log on via SSH... so despite there being some idle cpu time left over, it seems clear that something very bad is happening. Whatever scripts are running, are updating my MySQL database, but it seems to be exponentially slower then when they started.
I need some ideas to serve as launch points for me to diagnose what's going on.
Some things that I would like to know are:
How I can confirm how many scripts are actually running
Is there anyway to confirm that these scripts are actually shutting down when they are through, and not just "hanging around" taking up CPU time and memory?
What kind of bottlenecks should I be checking to make sure I don't create too many instances of this script so this doesn't happen again.
I realize this is probably a huge question, but I'm more then willing to follow any links provided and read up on this... I just need to know where to start looking.
High iowait means that your disk bandwidth is saturated. This might be just because you're flooding your MySQL server with too many queries, and it's maxing out the disk trying to load the data to execute them.
Alternatively, you might be running low on physical memory, causing large amounts of disk IO for swapping.
To start diagnosing, run vmstat 60 for 5 minutes and check the output - the si and so columns show swap-in and swap-out, and the bi and bo lines show other IO. (Edit your question and paste the output in for more assistance).
High iowait may mean you have a slow/defective disk. Try checking it out with a S.M.A.R.T. disk monitor.
http://www.linuxjournal.com/magazine/monitoring-hard-disks-smart
ps auxww | grep SCRIPTNAME
same.
Why are you running more than one instance of your script to begin with?

Resources