Can I get a faster output pipe than /dev/null? - linux

I am running a huge task [automated translation scripted with perl + database etc.] to run for about 2 weeks non-stop. While thinking how to speed it up I saw that the translator outputs everything (all translated sentences, all info on the way) to STDOUT all the time. This makes it work visibly slower when I get the output on the console.
I obviously piped the output to /dev/null, but then I thought "could there be something even faster?" It's so much output that it'd really make a difference.
And that's the question I'm asking You, because as far as I know there is nothing faster... (But I'm far from being a guru having used linux on a daily basis only last 3 years)

Output to /dev/null is implemented in the kernel, which is pretty bloody fast. The output pipe isn't your problem now, it's the time it takes to build the strings that are getting sent to /dev/null. I would recommend you go through the program and comment out (or guard with if $be_verbose) all the lines that are useless print statements. I'm pretty sure that'll give you a noticeable speedup.

I'm able (via dd) to dump 20 gigabytes of data per second down /dev/null. This is not your bottleneck :-p
Pretty much the only way to make it faster is to not generate the data in the first place - remove the logging statements entirely. The cost of producing all the log messages likely exceeds the cost of throwing them away quite a bit.

Unrelated to perl and standard output, but there is null_blk block device, which is even faster than /dev/null. Basically, it bounded by syscall performance and with large blocks it can saturate memory bus.

Related

Read contents of large files without using "cat" command in linux

I'm trying a more efficient way of reading file contents in Linux without using the "cat" command, especially for larger file contents, as in such cases cat just shoots up the memory and CPU on the server.
One thing that comes to my mind is using a grep -v "character-set-which-is-unlikely-in-the-file" filename
But using different character sets every time and hoping it would not appear in the file, wouldn't be efficient.
Any other thoughts ?
If you just want to read through the file, so it gets cached, the simplest way is perhaps this:
cat filename > /dev/null
Note that you don't need to show the data on screen to read it from disk. That command reads the file, and ignores the content by dumping it in /dev/null, but it still reads all the data.
If the CPU load goes up, that is probably a good thing, meaning that the computer is working hard, and will be finished sooner rather than later. If it crashes, though, there is something else wrong.
If you have some specific reason not to use the "cat" command, you can try "dd" instead, but it is more complicated to write and will not be faster:
dd if=filename of=/dev/null bs=1M
Addendum:
This inspired me to run some tests. On my particular computer both "cat" and "dd" took 24.27-24.31 seconds to read a large file on a mechanical disk when it wasn't already cached, and 0.39-0.40 seconds when it was cached. (Three tests of each case, with very little variability.)
Both these programs contain code to write the data, even if it is dumped to /dev/null, so one could expect that a program specifically written to just read would be slightly faster, but I got the same times when I tried that.

Minimizing IOPS with Fortran

I have a Fortran program that writes out a large amount of ASCII data, one line at-a-time, and there is some concern from system admins (and evidence from my runs) that this is adversely affecting system performance. I/O generally works better for fewer big writes than many small writes. So, I'd like to get the program to minimize the number of IOPS by writing out bigger chunks of data without changing the format of the output file (this is a large set of software with lots of related software depending on assumed file formats). I had thought turning a loop like this:
nwrite=100000000 !total number of lines to write
do cnt=1,nwrite
write(11,'(i22,3x,f16.14)')cnt,numar(cnt)
enddo
into a loop like this:
nwrite=100000000
nblock=10000 !number of lines to write in each block
do cnt=1,nwrite/nblock
write(11,'(i22,3x,f16.14)')(nblock*(cnt-1)+j,numar(nblock*(cnt-1)+j),j=1,nblock)
enddo
would do the trick. But I made two small scripts doing the above and they didn't show any real difference in run time. It's a fairly major time commitment to make the change in the actual code, so I'd like to be fairly sure before committing to an approach. I haven't completely unrolled the loop into a single write command because that might not work well for my current problem, though approaches that do this are also welcome.
Can anyone confirm whether the above code would reduce the actual number of write commands or what else might achieve what I'm looking for? Thanks in advance.
Based on input from other users and reading more, Fortran leaves control over this to the compiler, therefore it is compiler-dependent. Buffered writes are the default behavior for the Portland Group Fortran compiler, and it looks to be the same for GFortran. Intel does not buffer files by default. For the Intel compiler, adding the option -assume buffered_io will make file I/O buffered by default.

How to create a log file that "pop_front"s?

Suppose I have a console program that outputs trace debug lines on stdout, that I want to run on a server.
And then I do:
./serverapp > someoutputfile
If I need to see how the program's doing, I would just log into the server and do:
tail -f someoutputfile
However, understandably over time, someoutputfile would become pretty big.
Is there a way to make it so that someoutputfile is limited to a certain size, and only the most recent parts of it?
I mean, the hard way would be to make a custom script/program that cycles the output between different files, but that seems like overkill.
You can truncate the log file. One way to do this is to type:
>someoutputfile
at the shell command-line. It's a redirect with no output and it will erase all the contents of the file.
The tricky bit here is that any program writing to that file will continue to write into the file at its last output position. So the file will immediately gain a "hole" from 0 to X bytes, where X is the output position.
In most Linux file systems these holes result in sparse files, which don't actually use the space in the hole. So the file may contain many gigabytes of 0's at the beginning but only use 500 KB on disk.
Another way to do fast logging is to memory map a file on disk of fixed size: 16 MB for example. Then the logging writes into a memory pointer which wraps around when it reaches the size limit. It then continues to write at the front of the file. It's a good idea to have some kind of write position marker. I use <====>, for example. I find this method to be ridiculously fast and great for debug logging.
I haven't used it, but it gets good reviews here on SO, try logrotate
A more general discussion of managing output files may show you that a custom script/solution is not out of the question ;-) : Problem with Bash output redirection
I hope this helps.

How to only allowed certain text to be printed on Emacs Console?

This question maybe not related to Emacs only, but to all development environment that use Console for its debugging process. Here is the problem. I use Eshell to run the application we are being developed. It's a J2ME application. And for debugging, we just use System.out.println(). And now, suppose I want to allow only text that started with Eko: to be displayed in the console (interactively), is it possible?
I installed Cygwin in my Windows environment, and try to grep the output like this :
run | grep Eko:. It surely filtered only output with Eko: as the beginning, but it's not interactive. The output suppressed until the application quit. Well, that's useless anyway.
Is it possible to do it? What I mean is, we don't have to touch the application code itself?
I tag to linux also, because maybe some guys in Linux know the answer.
Many thanks!
The short: Try adding --line-buffered to your grep command.
The long: I assume that your application is flushing its output stream with every System.out.println(), and that grep has the lines available to read immediately, but is choosing to buffer output until it has 'enough' output saved up to make writing make sense. (This is typically 4k or 8k of data, which could be several hundred lines, depending upon your line length.)
This buffering makes great sense when the output is another program in the pipeline; reducing needless context switches is a great way to improve program throughput.
But if your printing is slow enough that it doesn't fill the buffer quickly enough for 'real time' output, then switching to line-buffered output might fix it.

Debugging under Linux: Is there a pseudo-tty-like circular buffer implementation?

I am developing under Linux with pretty tight constraints on disk usage. I'd like to be able to point logging to a fixed-size file. For example, if my application outputs all logs to stdout:
~/bin/myApp > /dev/debug1
and then, to see the last amount of output:
cat /dev/debug1
would write out however many bytes debug1 was setup to save (if at least that many had been written there).
This post suggests using expect or its library, but I was wondering if anyone has seen a "pseudo-tty" device driver-type implementation as I would prefer to not bind any more libraries to my executable.
I realize there are other mechanisms like logrotate, but I'd prefer to have a non-cron solution.
Pointers, suggestions, questions welcome!
Perhaps you could achieve what you want using mkfifo and something that reads the pipe with a suitable buffer. I haven't tried, but less --buffers=XXXXXX might work for this.

Resources