Testing IO performance in Linux [closed] - linux

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How do I test IO performance in Linux?

IO and filesystem benchmark is a complex topic. No single benchmarking tool is good in all situations. Here is a small overview about different benchmarking tools:
Block Storage:
IOMeter - Highly customizable and allows to coordinate multiple clients. Needs a Windows PC for the coordination application. Developed by Intel. On Linux, take maximum rates of older (at least 2006.07.27 and earlier) with a pinch of salt because the submission method was not optimal.
File System (synthetic):
FFSB - Flexible Filesystem Benchmark. Very neat benchmarking for Linux. Good customization of workload. NFS benchmarking (net-ffsb) a bit unsound.
Filebench - Extremely powerful, but originally developed for Solaris. Linux support isn't good.
sysbench - Mainly a DB benchmarking tool, but also basic filesystem benchmarking tool.
bonnie - Seems to be obsolete.
bonnie++ - C++ port of bonnie. Easy, but seems not to be very customizable.
File System (workload):
Postmark - Simulates the IO behavior of a mail server. Too small to stress good IO systems.
Stony Brook University and IBM Watson Labs have published a highly recommended journal paper in the "Transaction of Storage" about file system benchmarking, in which they present different benchmarks and their strong and weak points: A nine year study of file system and storage benchmarking. The article clearly points out that the results of most benchmarks at least questionable.
A note: Is the question programming related? Maybe not, but maybe it is. I spend a lot of time benchmarking the IO performance of the systems I develop. At least for me, questions about how to benchmarking these things is highly programming related. Please: Do not close all questions that are not development/programming related from your point of view. The point of view of other developers might be different.

tool: fio
link: http://freshmeat.net/projects/fio/
test physical disk IO:
./fio examples/disk-zone-profile
set parameter:
sequential r/w: rw=read or rw=write
random r/w: rw=randread or rw=randwrite

if you need a quick way without hassle of installing anything . This is the method I use for write speed test:
dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
And the output is something like this
root#rackserver:/# dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 4.86922 s, 221 MB/s
Also :
delete the test file after this to recover the extra space used
Some explanation :
bs = block size
count = the no of blocks to be written
Adjust these parameters to change the size of the file written as per your server specs and the amount of time you want to spend writing.
the read speed as suggested already by gtsouk, can be checked by using /dev/null as output.

dd if=/dev/sda of=/dev/null
Let this run for a few minutes and stop it with ctrl+C. It will print the read transfer speed of your drive/controller. This is the maximum read speed you can get out of your drive.

sysbench
See http://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench
Example
sysbench --test=fileio --file-total-size=150G prepare
sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
It can also test cpu, memory, threads, and database server performance,
It's awesome.
Or testing software written in java: http://www.dacapobench.org/

you need to specify what you're testing for, otherwise benchmarks will only mislead. There are different aspects of IO performance that you need to chose to optimize for, and different parameters to play with.
Your system parameters:
storage device: HDD, SSD (which?), Raid (which?)
filesystem, block size, journal mode
file cache, dirty thresholds, amount of memory
IO scheduler, its tunables
number of CPUs
kernel version
Your test parameters:
read or write performance?
sequential or random?
1 thread or multiple?
size of requests
optimize for throughput or request delay?

There is an excellent program to test block storage IO on Unix called IORATE. You can get a copy at iorate.org.
It can generate complex mixed IO, including re-use (hits) and hot zones for tiered storage testing.

Take a look at IOzone:
http://www.iozone.org/
If you would like to read a whitepaper illustrating real-world usage on an HPC cluster, please see this pdf, page 36:
http://i.dell.com/sites/content/business/solutions/hpcc/en/Documents/Dell-NSS-NFS-Storage-solution-final.pdf

Related

I use freebsd11 but iops is very very poor ('fio' tools)

I use 'fio' disk tools test speed. The disk is a intel ssd , trim and 4k alignment is enabled.
Hardware is Dell R610.
The disk controller is ahci and lsi 9211-8i, drive is p20, IT model, sysctl is default.
Freebsd file system is ZFS or UFS, CentOS is XFS.
If install centos7 and use 'fio' test too, is no problem.
command is:
fio -filename=/mnt/test.fio_test_file -direct=1 iodepth 1 -thread -rw=randread(or randrw/randwrite) -ioengine=psync -bs=4k -size 1G -numjobs=1(or 64) -runtime=30 -group_reporting -name=pleasehelpme
freebsd speed is:
[59172KB/0KB/0KB /s] [14.8K/0/0 iops]
centos7 speed is :
[248.5MB/0KB/0KB /s] [63.5K/0/0 iops]
close to 5 times the speed!
And test randrw randwrite, the problem remains.
But no rand, just read or write is very good, faster than centos..............
I have not tried other tools on freebsd11, may be fio problem? but i'm not sure...
So why ? and how to fix?
Retry at 2016-12-6
I read https://github.com/axboe/fio/ .
I think is fio problem, but test postgresql(two system the configure is same) tps is not good, two system tps very different.
Looks like freebsd really is not good performance, rather than fio problems.
May be configure problems? I do not know....
2017-01-08
I give up freebsd11, use centos7.
freebsd11 performance it should be great, But it is not , may be my configure is error, but i can not fix this disk iops problem. so ... Had to give up.
QAQ.....if you can fix this problem, please tell me.
Thank you very much.
In some cases depending on the hardware FreeBSD maybe need some adjustments, some times could be an issue with the controller (DELL PERC) or in others cases a simple kernel flag could help.
From https://wiki.freebsd.org/BenchmarkAdvice
Parallel read/write tests
If you do a FS/disk I/O test where writes and reads are interleaved / in parallel, you need to be aware that FreeBSD prioritizes writes over reads.
Check the the vfs.hidirtybuffers, generally lower it in order to force out dirty pages earlier and thus reduce the number that fsync has to deal with.
Benchmarking ZFS
If you want to benchmark ZFS, be aware that it will only shine if you are willing to spend money. Using ZFS on a one or two disks will not give improved performance (compared to e.g. UFS), but it will give improved safety for your data (you know when your data is damaged by e.g. radiation or data-manipulating harddisk-errors). To make it shine you need to add at least a lot fo RAM, or one read-optimized SSD for L2ARC cache for read performance (the number of SSD's depends upon the size of the workingset) or two mirrored (for data safety in case one SSD gets damaged) write-optimized SSDs for the ZIL for synchronous (DBs/NFS/...) write performance.
Try to use diskinfo
diskinfo -t /dev/ada0
The -t option triggers a simple and rather naive benchmark of the disks seek and transfer performance.
For ZFS: https://wiki.freebsd.org/ZFSTuningGuide

what is the proper way to test NFS performance [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I'm given a project where the only objective is to monitor a network's NFS performance. I know there's a bunch of open source tools out there, but still I would like to get the basic idea behind in order to better tweak those around. So the network consists of some hundred linux systems and some thousand accounts with NFS mounted home dir's; the script can be pushed out to every station, server is also possible, if that way does any good. Afaik, essentially all the script should do is a few dd's and watch the IO rate over NFS. And my question is just what is the proper way of doing so? Do I add a new account to the system solely to run the scripts?Some general thoughts are greatly appreciated :)
Bonnie
A classical performances evaluation tool tests. The main program tests database type access to a single file (or a set of files if you wish to test more than 1G of storage), and it tests creation, reading, and deleting of small files which can simulate the usage of programs such as Squid, INN, or Maildir format email.
Relevance to NFS:: Performance testing, workload
DBench
Dbench was written to allow independent developers to debug and test SAMBA. It is heavily inspired of the original SAMBA tool : NetBench
As NetBench it allow to:
torture the file system
improve the network load independently of the disk IO
Measure performances
But it does not need as much hardware resources as NetBench to run.
Relevance to NFS::
IOZone
Performance tests suite. POSIX and 64 bits compliant. This tests is the file system test from the L.S.E. Main features
POSIX async I/O, Mmap() file I/O, Normal file I/O
Single stream measurement, Multiple stream measurement, Distributed file server measurements (Cluster)
POSIX pthreads, Multi-process measurement
Selectable measurements with fsync, O_SYNC
Latency plots
Relevance to NFS:: Performance testing. Good for exercising a given mount point under various load conditions.
ful detail can be found here . http://wiki.linux-nfs.org/wiki/index.php/Testing_tools

Linux CPU Usage Tools

Background
I've written a tool to capture CPU usage on a per/thread basis. The output of the tools is a binary file, that I can pump into my parsing utility that I wrote. And the output of the parsing utility is a CSV file that I can import into Excel to chart pretty graphs of process/thread CPU usage.
This CPU usage capture tool is running on an embedded ARM platform running a Linux kernel based on 2.6.35.3. That being said, I was concerned about making the tool light weight. I didn't want it to store directly to a CSV file, in order to minimize the processing time and the file size of the captured data.
Question
The tool works, but I'm wondering if I took the long way around the problem? Is there already a tool out there that does this (or something like it)?
You're probably wondering why I care if I already made a tool that works. Well, it's not as light weight as I'd like. It's taking up about 10% of CPU usage. As a benchmark, top only takes up about 1% (max).
Update
I've decided to continue using my tool for now. At least until a better solution becomes available. I was able to shave off a couple percentage points by using open() instead of fopen() on /proc/stat. I'm also using read() instead of fgets().
IBM has a tool called nmon which does the same(for AIX & Linux): According to IBM's documentation, it takes ~2% CPU. You may want to look at that.
Comparing nmon with your tool could give you a fair idea about your program's performance and how you may improve your csv capture.
This might be a bit of a steep learning curve, but you might want look into SystemTap: http://sourceware.org/systemtap/

What are the available interactive languages that run in tiny memory? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I am looking for general purpose programming languages that
have an interactive (live coding) prompt
work in 32 KB of RAM by itself or 8 KB when the compiler is hosted on a separate machine
run on a microcontroller with as little as 8-32 KB RAM total (without an MMU).
Below is my list so far, what am I missing?
Python: The PyMite VM needs 64K flash, 8K RAM. Targets LPC, SAM7 and ATmegas with 8K or more. Hosted.
Lua: The eLua FAQ recommends 256K flash, 64K RAM.
FORTH: amforth needs 8K flash, 150 bytes RAM, 30 bytes EEPROM on an ATmega.
Scheme: armpit Scheme The smallest target is the LPC2103 with 32K Flash, 4K SRAM.
C: Interactive C runs on 68HC11 with no flash and 32K SRAM. Hosted.
C: picoc an open source, cross-compiling, interactive C system. When compiled for AVR, it takes 63K flash, 8K RAM. The RAM could be reduced with effort to keep tables in flash.
C++: AngelScript an open source, byte-code based, C/C++ like scripting language with easy native calls.
Tcl: TinyTCL runs on DOS, 60K binary. Looks easy to port.
BASIC: TinyBasic: Initializes with a 64K heap, might be adjustable.
Lisp
PostScript: (I haven't found a FOSS implementation for low memory yet)
Shell: bitlash: An interactive command shell for Arduino (ATmega). See also AVRSH.
A homebrew Forth runtime can be implemented in very little memory indeed. I know someone who made one on a Cosmac in the 1970s. The core runtime was just 30 bytes.
I hear that CHIP-8, XPL0, PicoC, and Objective Caml have been ported to graphing calculators.
The Wikipedia "Lego Mindstorms" article lists a bunch of programming languages that allegedly run on the Lego RCX or Lego NXT platform.
Do any of them meet your "live coding" criteria?
You might want to check out the other microcontroller Forths at the Forth wiki . It lists at least 4 Forths for the Atmel AVR: amforth (which you already mention), PFAVR, avrforth, and ByteForth.
(Links to those interpreters, as well as this StackOverflow question, are included in the "Embedded Systems" wikibook).
I would recommend LUA (or eLUA http://www.eluaproject.net/ ). I've "ported" LUA to a Cortex-M3 a while back. From the top of my head it had a flash size of 60~100KB and needed about 20KB RAM to run. I did strip down to the bare essentials, but depending on your application, that might be enough. There's still room for optimization, especially about RAM requirements, but I doubt you can run it comfortable in 8KB.
Some AVR interpreters/VMs:
http://www.cqham.ru/tbcgroup/index_eng.htm
http://www.jcwolfram.de/projekte/avr/chipbasic2/main.php
http://www.jcwolfram.de/projekte/avr/chipbasic8/main.php
http://www.jcwolfram.de/projekte/avr/main.php
http://code.google.com/p/python-on-a-chip/
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=688&item_type=project
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=626&item_type=project
http://www.avrfreaks.net/index.php?module=Freaks%20Academy&func=viewItem&item_id=460&item_type=project
http://www.harbaum.org/till/nanovm/index.shtml
Wren fits your criteria -- by default it's configured to use just 4k of RAM. AFAIK it hasn't seen any actual use, since the guy I wrote it for decided he didn't need an interpreter running wholly on the target system after all.
The language is influenced most obviously by ML and Forth.
Have you considered a port in C of Tiny Basic? Or, perhaps rewriting the UCSD Pascal p-machine to your architecture from Z-80?
Seriously, though, JavaScript would make a good embedded scripting language, but I've no clue what the minimum memory requirements are for the VM + GC, nor how difficult to remove OS dependencies. I played with NJS a while back, which could possibly fit your needs. This one is interesting in that the compiler is written in JavaScript (self hosting).
You can take a look at very powerful AvrCo Multitasking Pascal for AVR. You can try it at http://www.e-lab.de. MEGA8/88 version is free. There are tons of drivers and simulator with JTAG debugger and nice live or simulated visualizations of all standard devices (LCDCHAR, LCDGRAPH, 7SEG, 14SEG, LEDDOT, KEYBOARD, RC5, SERVO, STEPPER...).
You're missing EmbedVM, homepage here, svn repo here. Remember to check out both [1,2] videos on the front page ;)
From the homepage:
EmbedVM is a small embeddable virtual machine for microcontrollers
with a C-like language frontend. It has been tested with GCC and AVR
microcontrollers. But as the Virtual machine is rather simple it
should be easy to port it to other architectures.
The VM simulates a 16bit CPU that can access up to 64kB of memory. It
can only operate on 16bit values and arrays of 16bit and 8bit values.
There is no support for complex data structures (struct, objects,
etc.). A function can have a maximum of 32 local variables and 32
arguments.
Besides the memory for the VM, a small structure holding the VM state
and the reasonable amount of memory the EmbedVM functions need on the
stack there are no additional memory requirements for the VM.
Especially the VM does not depend on any dymaic memory management.
EmbedVM is optimized for size and simplicity, not execution speed. The
VM itself takes up about 3kB of program memory on an AVR
microcontroller. On an AVR ATmega168 running at 16MHz the VM can
execute about 75 VM instructions per millisecond.
All memory accesses done by the VM are parformed using user callback
functions. So it is possible to have some or all of the VM memory on
external memory devices, flash memory, etc. or "memory-map" hardware
functions to the VM.
The compiler is a UNIX/Linux commandline tool that reads in a *.evm
file and generates bytecode in vaious formats (binary file, intel hex,
C array initializers and a special debug output format). It also
generates a symbol file that can be used to access data in the VM
memory from the host application.
The C-like language looks like this: http://svn.clifford.at/embedvm/trunk/examples/numberquizz/vmcode.evm
I would recommend MY-BASIC, runs with in minimum 8 KB RAM, and easy to port.
There's also JavaScript, via Espruino.
This is built specifically for Microcontrollers and there are builds for various different chips (mainly STM32s) that fit a full system into as little as 8kB RAM.
Have you considered simply using the /bin/sh supplied by busybox? Or on of the smaller scripting languages they recommend?
Prolog - http://www.gprolog.org/
According to a google search "prolog small" the size of the executable can be made quite small by avoiding linking the built-in predicates.
None of the languages in the list in the question or in the answers proved satisfactory for the requirement of super easy compilation and integration into an existing micro controller project (disclosure: I didn't actually try every single one of the suggestions).
I found instead tinyscript which is a single .c+.h file that compiled with the rest of the source files on my project with the only additional configuration required being to provide a void outchar(int c) which can be empty if you don't require output from the scripts.
For me speed of execution is far less important than ease of build and integration and interop with C, as my use case is mainly just calling some C functions in order.
I have been using in my previous work busybox on a BlackFin.
we compiled perl + php for it, after changing s/fork/vfork/g it worked pretty good... more or less. Not having an MMU is not a good idea. The memory fragmentation will kill the server pretty easily. All I did was:
for i in `seq 1 100`; do wget http://black-fin-ip/test.php; done
It died while I was walking to my boss and telling him that the server is going to die in production :)
I would suggest use python. But now the only problem is the memory overhead right? So I have great idea for people who may be stuck in this problem later on.
First thing's first, write a bf interpreter(or just get source code from somewhere). The interpreter will be really small. Also bf is a Turing complete language. Now you need to write your code in python and then transpiler it to bf using bfpy( https://github.com/felko/bfpy/blob/master/README.md ). I've given you the solution with the least overhead and I am pretty sure a bf interpreter will easily stay under 10KB of ram usage.
Erlang - http://erlang.org/
it can fit in 2MB
http://www.experts123.com/q/is-erlang-small-enough-for-embedded-systems.html

Using "top" in Linux as semi-permanent instrumentation

I'm trying to find the best way to use 'top' as semi-permanent instrumentation in the development of a box running embedded Linux. (The instrumentation will be removed from the final-test and production releases.)
My first pass is to simply add this to init.d:
top -b -d 15 >/tmp/toploop.out &
This runs top in "batch" mode every 15 seconds. Let's assume that /tmp has plenty of spaceā€¦
Questions:
Is 15 seconds a good value to choose for general-purpose monitoring?
Other than disk space, how seriously is this perturbing the state of the system?
What other (perhaps better) tools could be used like this?
Look at collectd. It's a very light weight system monitoring framework coded for performance.
We use sysstat to monitor things like this.
You might find that vmstat and iostat with a delay and no repeat counter is a better option.
I suspect 15 seconds would be more than adequate unless you actually want to watch what's happening in real time, but that doesn't appear to be the case here.
As far as load, on an idling PIII 900Mhz w/ 768MB of RAM running Ubuntu (not sure which version, but not more than a year old) I have top updating every 0.5 seconds and it's about 2% CPU utilization. At 15s updates, I'm seeing 0.1% CPU utilization.
depending upon what exactly you want, you could use the output of uptime, free, and ps to get most, if not all, of top's information.
If you are looking for overall load, uptime is probably sufficient. However, if you want specific information about processes, you are adventurous, and have the /proc filessystem enabled, you may want to write your own tools. The primary benefit in this environment is that you can focus on exactly what you want and minimize the load introduced to the system.
The proc file system gives your application read access to the kernel memory that keeps track of many of the interesting variables. Reading from /proc is one of the lightest ways to get this information. Additionally, you may be able to get more information than provided by top. I've done this in the past to get amount of time spent in user and system by this process. Additionally, you can use this to get information about the number of file descriptors open by the process. You might also use this to get detailed information about how the network system is working.
Much of this information is pre-processed by other applications which can be used if you get the information you need. However, it is rather straight-forward to read the raw information. Do a man proc for more information.
Pity you haven't said what you are monitoring for.
You should decide whether 15 seconds is ok or not. Feel free to drop it way lower if you wish (and have a fast HDD)
No worries unless you are running a soft real-time system
Have a look at tools suggested in other answers. I'll add another sugestion: "iotop", for answering a "who is thrashing the HDD" questions.
At work for system monitoring during stress tests we use a tool called nmon.
What I love about nmon is it has the ability to export to XLS and generate beautiful graphs for you.
It generates statistics for:
Memory Usage
CPU Usage
Network Usage
Disk I/O
Good luck :)

Resources