Ksh cannot allocate memory linux - linux

I have two big arrays of strings (each of them has ~90000 elems).
I create them with set -A command.
And I need to figure out which of strings in first array don't have equal string in second.
My code:
for i in {0..${#hard_drive_files[*]}}; do
has_reference=false
for j in {0..${#files_in_db[*]}}; do
if [[ ${files_in_db[j]} == ${hard_drive_files[i]} ]]; then
has_reference=true
break
fi
done
if [[ $has_reference == false ]]; then
echo "${hard_drive_files[i]}"
fi
done
This part of code "eats" too much memory. At the end of execution value of used memory is ~80000 MB
After this part of code I try to archive some files but get cannot fork [Cannot allocate memory]
Is there a solution for such problem?
P.S.
kshVersion=Version AJM 93t+ 2010-02-02
To figure out how much ram memory is used I execute free -m

I assume there is specific reason to do it in Ksh ? Try to make some approximation of how much memory is needed for such table, and compare to amount of RAM and SWAP. I bet it's not a program specifically, but some per-process memory/swap usage limit in limits.conf or sysctl.conf.
You could also try to split data to some groups, by concept, like first letter of name or something, to decrease amount of needed memory. Your code probably is far from optimal, it will be better first to gather all information you need and then reuse it, instead of repeating whole procedure in nested loop like you're trying to do.

Related

Linux - read or collect file content faster (e.g. cpu temp every sec.)

I'm working on a system on which ubuntu is running. I'm reading basic data like CPU frequency and temperature out of the thermal zones provided in /sys/class/thermal.
Unfortunately, I've got around 100 thermal_zones from which I need to read the data. I do it with:
for SENSOR_NODE in /sys/class/thermal/thermal_zone*; do printf "%s: %s\n" $(cat ${SENSOR_NODE}/type) $(cat ${SENSOR_NODE}/temp); done
To collect all data takes ~2.5-3 sec. which is way to long.
Since I want to collect the data every second my question is, if there is a way to "read" or "collect" the data faster?
Thank you in advance
There's only so much you can do while writing your code in shell, but let's start with the basics.
Command substitutions, $(...), are expensive: They require creating a FIFO, fork()ing a new subprocess, connecting the FIFO to that subprocess's stdout, reading from the FIFO and waiting for the commands running in that subshell to exit.
External commands, like cat, are expensive: They require linking and loading a separate executable; and when you run them without exec (in which case they inherit and consume the shell's process ID), they also require a new process to be fork()ed off.
All POSIX-compliant shells give you a read command:
for sensor_node in /sys/class/thermal/thermal_zone*; do
read -r sensor_type <"$sensor_node/type" || continue
read -r sensor_temp <"$sensor_node/temp" || continue
printf '%s: %s\n' "$sensor_type" "$sensor_temp"
done
...which lets you avoid the command substitution overhead and the overhead of cat. However, read reads content only one byte at a time; so while you're not paying that overhead, it's still relatively slow.
If you switch from /bin/sh to bash, you get a faster alternative:
for sensor_node in /sys/class/thermal/thermal_zone*; do
printf '%s: %s\n' "$(<"$sensor_node/type)" "$(<"$sensor_node/temp")"
done
...as $(<file) doesn't need to do the one-byte-at-a-time reads that read does. That's only faster for being bash, though; it doesn't mean it's actually fast. There's a reason modern production monitoring systems are typically written in Go or with a JavaScript runtime like Node.

How to split a large variable?

I'm working with large variables and it can be very slow "looping" through them with while read line, I found out that the smaller the variable the faster it works.
How can I split large variable into smaller variables and then read them one by one?
for example,
What I would like to achieve:
bigVar=$(echo "$bigVar" | split_var)
for var in "${bigVar[#]}"; do
while read line; do
...
done <<< "${var}"
done
or may be split to bigVar1, bigVar2, bigVar3 etc.. and than read them one by one.
Instead of doing
bigVar=$(someCommand)
while read line
do
...
done <<< "$bigVar"
Use
while read line
do
...
done < <(someCommand)
This way, you avoid the problem with big variables entirely, and someCommand can output gigabyte after gigabyte with no problem.
If the reason you put it in a variable was to do work in multiple steps on it, rewrite it as a pipeline.
If BigVar is made of words, you could use xargs to split it in lines no longer than the maximum length of a command line, usually 32kb or 64kb :
someCommand|xargs|while read line
do
...
done
In this case xargs uses its default command, which is echo.
I'm curious about what you want to do in the while loop, as it may be optimized with a pipeline.

Write a certain hex pattern to a file from bash

I am trying to do some memory tests and am trying to write a certain hex pattern to a regular file from bash. How would I go about doing this without using the xxd or hexdump tool/command?
Thanks,
Neco
The simplest thing is probably:
printf '\xde\xad\xbe\xef' > file
but it is often more convenient to do
perl -e 'print pack "H*", "deadbeef"' > file
If I get your question correctly printf should do:
>printf %X 256
100
Can you use od -x instead? That's pretty universaly available, od has been around since the dawn of time[1]
[1] Not really the dawn of time.
There are multiple ways to do this in bash one of the way is
\x31
'\' is used to skip next charcter from bash decoding
'x' show its a hex number
echo -en \\x31\\x32\\x33 > test
-e to avoid trailing line (else 0x0A will apend at end)
-n to interprete backslash escapes
Memory testing is much more complex subject than just writing / reading patterns in memory. Memory testing puts pretty hard limits on what a testing program can do and what state the whole system is in. Technically, it's impossible to test 100% of memory when you're running regular OS at all.
On the other hand, you can run some real test program from a shell, or schedule test execution on next boot with some clever hacking around. You might want to take a look at how it's done in Inquisitor, i.e. running memtester for in-OS testing and scheduling memtest86* run on next boot.
If you absolutely must remain in your current booted OS, then probably memtester would be your tool of choice - although note that it's not very precise memory test.
There are a lot of suggestions of using printf and echo, but there's one tiny difference. Printf is not capable of producing zeros (binary zeros), while echo does the job properly. Consider these examples:
printf "\x31\x32\x00\x33\x00\x00\x00\x34">printf.txt
echo -en "\x31\x32\x00\x33\x00\x00\x00\x34">echo.txt
As a result, printf.txt has a size of 3 bytes (yep, it writes the first zero and stops). And the echo.txt is 8 bytes long and contains actual data.

What is the best way to prevent out of memory (OOM) freezes on Linux?

Is there a way to make the OOM killer work and prevent Linux from freezing? I've been running Java and C# applications, where any memory allocated is usually used, and (if I'm understanding them right) overcommits are causing the machine to freeze. Right now, as a temporary solution, I added,
vm.overcommit_memory = 2
vm.overcommit_ratio = 10
to /etc/sysctl.conf.
Kudos to anyone who can explain why the existing OOM killer can't function correctly in a guaranteed manner, killing processes whenever the kernel runs out of "real" memory.
EDIT -- many responses are along the lines of Michael's "if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory". I don't think this is the correct solution. There will always be apps with bugs, and I'd like to adjust the kernel so my entire system doesn't freeze. Given my current technical understandings, this doesn't seem like it should be impossible.
Below is a really basic perl script I wrote. With a bit of tweaking it could be useful. You just need to change the paths I have to the paths of any processes that use Java or C#. You could change the kill commands I've used to restart commands also.
Of course to avoid typing in perl memusage.pl manually, you could put it into your crontab file to run automatically. You could also use perl memusage.pl > log.txt to save its output to a log file. Sorry if it doesn't really help, but I was bored while drinking a cup of coffee. :-D Cheers
#!/usr/bin/perl -w
# Checks available memory usage and calculates size in MB
# If free memory is below your minimum level specified, then
# the script will attempt to close the troublesome processes down
# that you specify. If it can't, it will issue a -9 KILL signal.
#
# Uses external commands (cat and pidof)
#
# Cheers, insertable
our $memmin = 50;
our #procs = qw(/usr/bin/firefox /usr/local/sbin/apache2);
sub killProcs
{
use vars qw(#procs);
my #pids = ();
foreach $proc (#procs)
{
my $filename=substr($proc, rindex($proc,"/")+1,length($proc)-rindex($proc,"/")-1);
my $pid = `pidof $filename`;
chop($pid);
my #pid = split(/ /,$pid);
push #pids, $pid[0];
}
foreach $pid (#pids)
{
#try to kill process normall first
system("kill -15 " . $pid);
print "Killing " . $pid . "\n";
sleep 1;
if (-e "/proc/$pid")
{
print $pid . " is still alive! Issuing a -9 KILL...\n";
system("kill -9 " + $pid);
print "Done.\n";
} else {
print "Looks like " . $pid . " is dead\n";
}
}
print "Successfully finished destroying memory-hogging processes!\n";
exit(0);
}
sub checkMem
{
use vars qw($memmin);
my ($free) = $_[0];
if ($free > $memmin)
{
print "Memory usage is OK\n";
exit(0);
} else {
killProcs();
}
}
sub main
{
my $meminfo = `cat /proc/meminfo`;
chop($meminfo);
my #meminfo = split(/\n/,$meminfo);
foreach my $line (#meminfo)
{
if ($line =~ /^MemFree:\s+(.+)\skB$/)
{
my $free = ($1 / 1024);
&checkMem($free);
}
}
}
main();
If your processes's oom_adj is set to -17 it won't be considered for killing altough I doubt it's the issue here.
cat /proc/<pid>/oom_adj
will tell you the value of your process(es)'s oom_adj.
I put together a simple script that'll set the OOM score on launch. All sub-processes will inherit this score.
#!/usr/bin/env sh
if [ -z "$1" ] || [ -z "$2" ]; then
echo "Usage: $(basename "$0") oom_score_adj command [args]..."
echo " oom_score_adj A score between -1000 and 1000, bigger gets killed first"
echo " command The command to run"
echo " [args] Optional args for the command to run"
exit 1
fi
set -eux
echo $1 > /proc/self/oom_score_adj
shift
exec $#
The script sets the score for the local process to the first arg provided. This can be anything between -1000 to 1000, where 1000 is the most likely to get killed first. The rest of the arguments are then executed as a command with args, replacing the current process.
I'd have to say the best way of preventing OOM freezes is to not run out of virtual memory. If you are regularly running out of virtual memory, or getting close, then you have bigger problems.
Most tasks don't handle failed memory allocations very well so tend to crash or lose data. Running out of virtual memory (with or without overcommit) will cause some allocations to fail. This is usually bad.
Moreover, before your OS runs out of virtual memory, it will start doing bad things like discarding pages from commonly used shared libraries, which is likely to make performance suck as they have to be pulled back in often, which is very bad for throughput.
My suggestions:
Get more ram
Run fewer processes
Make the processes you do run use less memory (This may include fixing memory leaks in them)
And possibly also
Set up more swap space
If that is helpful in your use-case.
Most multi-process servers run a configurable (maximum) number of processes, so you can typically tune it downwards. Multithreaded servers typically allow you to configure how much memory to use for their buffers etc internally.
First off, how can you be sure the freezes are OOM killer related? I've got a network of systems in the field and I get not infrequent freezes, which don't seem to be OOM related (our app is pretty stable in memory usage). Could it be something else? Is there any interesting hardware involved? Any unstable drivers? High performance video?
Even if the OOM killer is involved, and worked, you'd still have problems, because stuff you thought was running is now dead, and who knows what sort of mess it's left behind.
Really, if you are experiencing OOM killer related problems, then you probably need to fix whatever is causing you to run out of memory.
I've found that fixing stability issues mostly relies on accurately identifying the root cause. Unfortunately, this requires being able to see what's happening when the issue happens, which is a really bad time to be trying to start various monitoring programs.
One thing I sometimes found helpful was to start a little monitoring script at boot time which would log various interesting numbers and snapshot the running processes. Then, in the event of a crash, I could look at the situation just before the crash. I sometimes found that intuition was quite wrong about the root cause. Unfortunately, that script is long out-of-date, or I'd give a link.

Read data from pipe and write to standard out with a delay in between. Must handle binary files too

I have been trying for about an hour now to find an elegant solution to this problem. My goal is basically to write a bandwidth control pipe command which I could re-use in various situations (not just for network transfers, I know about scp -l 1234). What I would like to do is:
Delay for X seconds.
Read Y amount (or less than Y if there isn't enough) data from pipe.
Write the read data to standard output.
Where:
X could be 1..n.
Y could be 1 Byte up to some high value.
My problem is:
It must support binary data which Bash can't handle well.
Roads I've taken or at least thought of:
Using a while read data construct, it filters all white characters in the encoding your using.
Using dd bs=1 count=1 and looping. dd doesn't seem to have different exit codes for when there were something in if and not. Which makes it harder to know when to stop looping. This method should work if I redirect standard error to a temporary file, read it to check if something was transfered (as it's in the statistics printed on stderr) and repeat. But I suspect that it's extremely slow if used on large amounts of data and if it's possible I'd like to skip creating any temporary files.
Any ideas or suggestions on how to solve this as cleanly as possible using Bash?
may be pv -qL RATE ?
-L RATE, --rate-limit RATE
Limit the transfer to a maximum of RATE bytes per second. A
suffix of "k", "m", "g", or "t" can be added to denote kilobytes
(*1024), megabytes, and so on.
It's not much elegant but you can use some redirection trick to catch the number of bytes copied by dd and then use it as the exit condition for a while loop:
while [ -z "$byte_copied" ] || [ "$byte_copied" -ne 0 ]; do
sleep $X;
byte_copied=$(dd bs=$Y count=1 2>&1 >&4 | awk '$2 == "byte"{print $1}');
done 4>&1
However, if your intent is to limit the transfer throughput, I suggest you to use pv.
Do you have to do it in bash? Can you just use an existing program such as cstream?
cstream meets your goal of a bandwidth controlled pipe command, but doesn't necessarily meet your other criteria with regard to your specific algorithm or implementation language.
What about using head -c ?
cat /dev/zero | head -c 10 > test.out
Gives you a nice 10 bytes file.

Resources