Read data from pipe and write to standard out with a delay in between. Must handle binary files too - linux

I have been trying for about an hour now to find an elegant solution to this problem. My goal is basically to write a bandwidth control pipe command which I could re-use in various situations (not just for network transfers, I know about scp -l 1234). What I would like to do is:
Delay for X seconds.
Read Y amount (or less than Y if there isn't enough) data from pipe.
Write the read data to standard output.
Where:
X could be 1..n.
Y could be 1 Byte up to some high value.
My problem is:
It must support binary data which Bash can't handle well.
Roads I've taken or at least thought of:
Using a while read data construct, it filters all white characters in the encoding your using.
Using dd bs=1 count=1 and looping. dd doesn't seem to have different exit codes for when there were something in if and not. Which makes it harder to know when to stop looping. This method should work if I redirect standard error to a temporary file, read it to check if something was transfered (as it's in the statistics printed on stderr) and repeat. But I suspect that it's extremely slow if used on large amounts of data and if it's possible I'd like to skip creating any temporary files.
Any ideas or suggestions on how to solve this as cleanly as possible using Bash?

may be pv -qL RATE ?
-L RATE, --rate-limit RATE
Limit the transfer to a maximum of RATE bytes per second. A
suffix of "k", "m", "g", or "t" can be added to denote kilobytes
(*1024), megabytes, and so on.

It's not much elegant but you can use some redirection trick to catch the number of bytes copied by dd and then use it as the exit condition for a while loop:
while [ -z "$byte_copied" ] || [ "$byte_copied" -ne 0 ]; do
sleep $X;
byte_copied=$(dd bs=$Y count=1 2>&1 >&4 | awk '$2 == "byte"{print $1}');
done 4>&1
However, if your intent is to limit the transfer throughput, I suggest you to use pv.

Do you have to do it in bash? Can you just use an existing program such as cstream?
cstream meets your goal of a bandwidth controlled pipe command, but doesn't necessarily meet your other criteria with regard to your specific algorithm or implementation language.

What about using head -c ?
cat /dev/zero | head -c 10 > test.out
Gives you a nice 10 bytes file.

Related

Does tee forward data that has not made it into the file?

I'm zeroing a new hard disk like so:
pv /dev/zero | tee /dev/sdb | sha1sum -
The idea is that I will zero the disk and simultaneously compute a checksum of however many zeros got written. Then I'll sha1sum the block device and see if it matches the data that I originally wrote to it.
The question is, what happens when "tee" runs out of space on the device and terminates? Say the block device is exactly 1 million bytes; tee will obviously fill it with 1 million zero bytes, but will it forward exactly 1 million zero bytes to sha1sum?
Answer to the original question:
No, tee will not stop writing to stdout at precisely the point at which a write to a file specified in an argument fails.
But I don't see that it matters much. It appears that your goal is to ensure that the entire disk has been overwritten with zeros, without worrying about how big the disk is. So reading the disk and comparing every block read to a block of zeros should suffice. You can do that with cmp /dev/sdb /dev/zero. If it says "EOF on /dev/sdb", then all the bytes were 0.
For what it's worth, I thought of another way to do the same thing, albeit a little indirectly:
pv /dev/zero | dd bs=100M of=/dev/sdb 2> log
dd's report (captured in "log") should contain a precise count of bytes written, and you can use that to compute the sha1sum (or, alternatively, diff the block device against a generated stream of exactly that many zeros).
(bs=100M is because dd's default block size is 512 bytes, which turns out not to be performant in my use case)

How to split a large variable?

I'm working with large variables and it can be very slow "looping" through them with while read line, I found out that the smaller the variable the faster it works.
How can I split large variable into smaller variables and then read them one by one?
for example,
What I would like to achieve:
bigVar=$(echo "$bigVar" | split_var)
for var in "${bigVar[#]}"; do
while read line; do
...
done <<< "${var}"
done
or may be split to bigVar1, bigVar2, bigVar3 etc.. and than read them one by one.
Instead of doing
bigVar=$(someCommand)
while read line
do
...
done <<< "$bigVar"
Use
while read line
do
...
done < <(someCommand)
This way, you avoid the problem with big variables entirely, and someCommand can output gigabyte after gigabyte with no problem.
If the reason you put it in a variable was to do work in multiple steps on it, rewrite it as a pipeline.
If BigVar is made of words, you could use xargs to split it in lines no longer than the maximum length of a command line, usually 32kb or 64kb :
someCommand|xargs|while read line
do
...
done
In this case xargs uses its default command, which is echo.
I'm curious about what you want to do in the while loop, as it may be optimized with a pipeline.

Ksh cannot allocate memory linux

I have two big arrays of strings (each of them has ~90000 elems).
I create them with set -A command.
And I need to figure out which of strings in first array don't have equal string in second.
My code:
for i in {0..${#hard_drive_files[*]}}; do
has_reference=false
for j in {0..${#files_in_db[*]}}; do
if [[ ${files_in_db[j]} == ${hard_drive_files[i]} ]]; then
has_reference=true
break
fi
done
if [[ $has_reference == false ]]; then
echo "${hard_drive_files[i]}"
fi
done
This part of code "eats" too much memory. At the end of execution value of used memory is ~80000 MB
After this part of code I try to archive some files but get cannot fork [Cannot allocate memory]
Is there a solution for such problem?
P.S.
kshVersion=Version AJM 93t+ 2010-02-02
To figure out how much ram memory is used I execute free -m
I assume there is specific reason to do it in Ksh ? Try to make some approximation of how much memory is needed for such table, and compare to amount of RAM and SWAP. I bet it's not a program specifically, but some per-process memory/swap usage limit in limits.conf or sysctl.conf.
You could also try to split data to some groups, by concept, like first letter of name or something, to decrease amount of needed memory. Your code probably is far from optimal, it will be better first to gather all information you need and then reuse it, instead of repeating whole procedure in nested loop like you're trying to do.

Write a certain hex pattern to a file from bash

I am trying to do some memory tests and am trying to write a certain hex pattern to a regular file from bash. How would I go about doing this without using the xxd or hexdump tool/command?
Thanks,
Neco
The simplest thing is probably:
printf '\xde\xad\xbe\xef' > file
but it is often more convenient to do
perl -e 'print pack "H*", "deadbeef"' > file
If I get your question correctly printf should do:
>printf %X 256
100
Can you use od -x instead? That's pretty universaly available, od has been around since the dawn of time[1]
[1] Not really the dawn of time.
There are multiple ways to do this in bash one of the way is
\x31
'\' is used to skip next charcter from bash decoding
'x' show its a hex number
echo -en \\x31\\x32\\x33 > test
-e to avoid trailing line (else 0x0A will apend at end)
-n to interprete backslash escapes
Memory testing is much more complex subject than just writing / reading patterns in memory. Memory testing puts pretty hard limits on what a testing program can do and what state the whole system is in. Technically, it's impossible to test 100% of memory when you're running regular OS at all.
On the other hand, you can run some real test program from a shell, or schedule test execution on next boot with some clever hacking around. You might want to take a look at how it's done in Inquisitor, i.e. running memtester for in-OS testing and scheduling memtest86* run on next boot.
If you absolutely must remain in your current booted OS, then probably memtester would be your tool of choice - although note that it's not very precise memory test.
There are a lot of suggestions of using printf and echo, but there's one tiny difference. Printf is not capable of producing zeros (binary zeros), while echo does the job properly. Consider these examples:
printf "\x31\x32\x00\x33\x00\x00\x00\x34">printf.txt
echo -en "\x31\x32\x00\x33\x00\x00\x00\x34">echo.txt
As a result, printf.txt has a size of 3 bytes (yep, it writes the first zero and stops). And the echo.txt is 8 bytes long and contains actual data.

File output redirection in Linux

I have two programs A and B. I can't change the program A - I can only run it with some parameters, but I have written the B myself, and I can modify it the way I like.
Program A runs for a long time (20-40 hours) and during that time it produces output to the file, so that its size increases constantly and can be huge at the end of run (like 100-200 GB). The program B then reads the file and calculates some stuff. The special property of the file is that its content is not correlated: I can divide the file in half and run calculations on each part independently, so that I don't need to store all the data at once: I can calculate on the first part, then throw it away, calculate on the second one, etc.
The problem is that I don't have enough space to store such a big files. I wonder if it is possible to pipe somehow the output of the A to B without storing all the data at once and without making huge files. Is it possible to do something like that?
Thank you in advance, this is crucial for me now, Roman.
If program A supports it, simply pipe.
A | B
Otherwise, use a fifo.
mkfifo /tmp/fifo
ls -la > /tmp/fifo &
cat /tmp/fifo
EDIT: Adjust buffer sizes with ulimit -p and then:
cat /tmp/fifo | B
It is possible to pipeline output of one program into another.
Read here to know the syntax and know-hows of Unix pipelining.
you can use socat which can take stdout and feed it to network and get from network and feed it to stdin
named or unnamed pipe have a problem of small ( 4k ? ) buffer .. that means too many process context switches if you are writing multi gb ...
Or if you are adventurous enough .. you can LD_PRELOAD a so in process A, and trap the open/write calls to do whatever ..

Resources