I use freebsd11 but iops is very very poor ('fio' tools) - freebsd

I use 'fio' disk tools test speed. The disk is a intel ssd , trim and 4k alignment is enabled.
Hardware is Dell R610.
The disk controller is ahci and lsi 9211-8i, drive is p20, IT model, sysctl is default.
Freebsd file system is ZFS or UFS, CentOS is XFS.
If install centos7 and use 'fio' test too, is no problem.
command is:
fio -filename=/mnt/test.fio_test_file -direct=1 iodepth 1 -thread -rw=randread(or randrw/randwrite) -ioengine=psync -bs=4k -size 1G -numjobs=1(or 64) -runtime=30 -group_reporting -name=pleasehelpme
freebsd speed is:
[59172KB/0KB/0KB /s] [14.8K/0/0 iops]
centos7 speed is :
[248.5MB/0KB/0KB /s] [63.5K/0/0 iops]
close to 5 times the speed!
And test randrw randwrite, the problem remains.
But no rand, just read or write is very good, faster than centos..............
I have not tried other tools on freebsd11, may be fio problem? but i'm not sure...
So why ? and how to fix?
Retry at 2016-12-6
I read https://github.com/axboe/fio/ .
I think is fio problem, but test postgresql(two system the configure is same) tps is not good, two system tps very different.
Looks like freebsd really is not good performance, rather than fio problems.
May be configure problems? I do not know....
2017-01-08
I give up freebsd11, use centos7.
freebsd11 performance it should be great, But it is not , may be my configure is error, but i can not fix this disk iops problem. so ... Had to give up.
QAQ.....if you can fix this problem, please tell me.
Thank you very much.

In some cases depending on the hardware FreeBSD maybe need some adjustments, some times could be an issue with the controller (DELL PERC) or in others cases a simple kernel flag could help.
From https://wiki.freebsd.org/BenchmarkAdvice
Parallel read/write tests
If you do a FS/disk I/O test where writes and reads are interleaved / in parallel, you need to be aware that FreeBSD prioritizes writes over reads.
Check the the vfs.hidirtybuffers, generally lower it in order to force out dirty pages earlier and thus reduce the number that fsync has to deal with.
Benchmarking ZFS
If you want to benchmark ZFS, be aware that it will only shine if you are willing to spend money. Using ZFS on a one or two disks will not give improved performance (compared to e.g. UFS), but it will give improved safety for your data (you know when your data is damaged by e.g. radiation or data-manipulating harddisk-errors). To make it shine you need to add at least a lot fo RAM, or one read-optimized SSD for L2ARC cache for read performance (the number of SSD's depends upon the size of the workingset) or two mirrored (for data safety in case one SSD gets damaged) write-optimized SSDs for the ZIL for synchronous (DBs/NFS/...) write performance.
Try to use diskinfo
diskinfo -t /dev/ada0
The -t option triggers a simple and rather naive benchmark of the disks seek and transfer performance.
For ZFS: https://wiki.freebsd.org/ZFSTuningGuide

Related

Why is the iops observed by fio different from that observed by iostat?

Recently, I'm trying to test my disk using fio. My configuration of fio is as follows:
[global]
invalidate=0 # mandatory
direct=1
#sync=1
fdatasync=1
thread=1
norandommap=1
runtime=10000
time_based=1
[write4k-rand]
stonewall
group_reporting
bs=4k
size=1g
rw=randwrite
numjobs=1
iodepth=1
In this configuration, you can see that I configured fio to do random writes using direct io. While the test is running, I used iostat to monitor the I/O performance. And I found that: if I set fdatasync to 1, then the iops observed by fio is about 64, while that observed by iostat is about 170. Why is this different? And if I don't configure the "fdatasync", both iops are approximately the same, but much higher, about 450. Why? As far as I know, direct io does not go through page cache, which, in my opinion, means that it should take about the same time not matter whether fdatasync is used.
And I heard that iostat could come up with wrong statistics under some circumstances. Is that real? What exactly circumstance could make iostat go wrong? Is there any other tools that I can use to monitor the I/O performance?
Looking at your jobfile it appears you are not doing I/O against a block device but instead against a file within a filesystem. Thus while you may ask the filesystem "put this data at that location in that file" the filesystem may turn into multiple block device requests because it has to also update metadata associated with that file (e.g. the journal, file timestamps, copy on write etc) too. Thus when the requests are sent down to the disk (which is what you're measuring with iostat) the original request has been amplified.
Something to also bear in mind is that Linux may have an ioscheduler for that disk. This can rearrange, split and merge requests before submission to the disk / returning them further up in the stack. See the different parameters of nomerges in https://www.kernel.org/doc/Documentation/block/queue-sysfs.txt for how to avoid some of the merging/rearranging but note you can't control the splitting of a request that is too large (but a filesystem won't make overly large requests).
(PS: I've not known iostat to be "wrong" so you might need to ask the people who say it directly to find out what they mean)

linux 2.6.43, ext3, 10K RPM SAS disk, 2 sequential write(direct io) on different file acting like random write

I recently stall on this one problem:
"2 sequential write(direct io 4KB alignemnt block) on different file acting like random write, which yield poor write performance in 10K RPM SAS disk".
The thing confuse me most: I got batch of server, all equip with same kind of disk (raid 1 with 2 300GB 10K RPM disk), but response different.
several servers seams ok with this kind of write pattern, disk happy accepted up to 50+MB/s;
(same kernel version, same filesystem, with different lib (libc 2.4))
others not so much, 100 op/s seams reach the limit of underlying disk, which confirm the random write performance of disk;
((same kernel version, same filesystem, with different lib (libc 2.12)))
[NOTE: I check the "pwrite" code of different libc, which tell nothing but simple "syscall"]
I have managed to rule out the possibly:
1. software bug in my own program;
by a simple deamon(compile with no dynamic link), do sequcetial direct io write;
2. disk problem;
switch 2 different version of linux system on one test machine, which perform well on my direct io write pattern, and a couple of day after switch to old lib version, the bad random write;
I try to compare:
/sys/block/sda/queue/*, which may different in both way;
filefrag show nothing but two different file interleaved sequenctial grow physical block id;
there must be some kind of write strategy lead to this problem, but i don't know where to start:
different kernel setting ?, may be related to how ext3 allocate disk block ?
raid cache(write back) or disk cache write strategy?
or underlying disk strategy to mapping logical block into real physical block ?
really appreciate
THE ANS IS:
it's because of /sys/block/sda/queue/schedule setting:
MACHINE A: display schedule: cfq, but undlying, it's deadline;
MACHINE B: the schedule is consistent with cfq;
//=>
SINCE my server is db svr, deadline is my best option;

Is zfs on linux reliable enough to be installed with gentoo on a network attached storage?

I have hard time to decide that which filesystem is the best and Raid or not Raid selection. I have 4 different hard disk.
1. 120GB SSD
2. 160GB
3. 500GB
4. 1TB
And I have noticed that zfs on linux does not support trim officially but third party patch, therefore, it is not fully tested....
As a result, I don't intend to let my ssd use zfs, I will use ext4 instead...
So, Is zfs on linux reliable enough to be installed with gentoo on a network attached storage?
Or you guys have another good solution for me?
P.S. if zfs is reliable enough, RAIDZ is a good choice?
Define "reliable enough". If you're expecting bug-free operation, then no, it simply is not there yet. If you're happy to potentially pick up the pieces from a total failure of the file system, then it's good enough.
If your data has any value and you're talking about a production machine, do not touch unproven file system drivers with a barge pole unless you know exactly what you're getting into. A third-party patch for TRIM is definitely living on the edge.
If you desperately want to use ZFS, use FreeBSD where it is more mature. If you're sticking with Linux, I would personally keep to ext4. It's a solid system, and the drives you're talking about don't really demand the benefits of ZFS, in my opinion.
In your description you mention a set of disks you want to use. If you would use this set in e.g. RAIDZ and one disk crashes, your disk-array would lose all its data.
Reliability is a very broad expression;
If you want to protect your data agains bit-rot, then ZFS is an excellent choice.
If you would use it with your available disks and you would like to protect your data against hardware failures, consider using a more homogene set of disk.
Regarding your selection of OS, try OpenIndiana http://openindiana.org/ or some turn-key OS like http://www.freenas.org/ for relatively simple (but effective) configurations.
Open Indiana is very reliable but a huge pain to run. FreeBSD has ZFS support built in. however (depending on what you are doing and how reliable you need) I would install it on Ubuntu if there is any margin for error.... if not.. then use Open Indiana or FreeBSD

Testing IO performance in Linux [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
How do I test IO performance in Linux?
IO and filesystem benchmark is a complex topic. No single benchmarking tool is good in all situations. Here is a small overview about different benchmarking tools:
Block Storage:
IOMeter - Highly customizable and allows to coordinate multiple clients. Needs a Windows PC for the coordination application. Developed by Intel. On Linux, take maximum rates of older (at least 2006.07.27 and earlier) with a pinch of salt because the submission method was not optimal.
File System (synthetic):
FFSB - Flexible Filesystem Benchmark. Very neat benchmarking for Linux. Good customization of workload. NFS benchmarking (net-ffsb) a bit unsound.
Filebench - Extremely powerful, but originally developed for Solaris. Linux support isn't good.
sysbench - Mainly a DB benchmarking tool, but also basic filesystem benchmarking tool.
bonnie - Seems to be obsolete.
bonnie++ - C++ port of bonnie. Easy, but seems not to be very customizable.
File System (workload):
Postmark - Simulates the IO behavior of a mail server. Too small to stress good IO systems.
Stony Brook University and IBM Watson Labs have published a highly recommended journal paper in the "Transaction of Storage" about file system benchmarking, in which they present different benchmarks and their strong and weak points: A nine year study of file system and storage benchmarking. The article clearly points out that the results of most benchmarks at least questionable.
A note: Is the question programming related? Maybe not, but maybe it is. I spend a lot of time benchmarking the IO performance of the systems I develop. At least for me, questions about how to benchmarking these things is highly programming related. Please: Do not close all questions that are not development/programming related from your point of view. The point of view of other developers might be different.
tool: fio
link: http://freshmeat.net/projects/fio/
test physical disk IO:
./fio examples/disk-zone-profile
set parameter:
sequential r/w: rw=read or rw=write
random r/w: rw=randread or rw=randwrite
if you need a quick way without hassle of installing anything . This is the method I use for write speed test:
dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
And the output is something like this
root#rackserver:/# dd if=/dev/zero of=test bs=64k count=16k conv=fdatasync
16384+0 records in
16384+0 records out
1073741824 bytes (1.1 GB) copied, 4.86922 s, 221 MB/s
Also :
delete the test file after this to recover the extra space used
Some explanation :
bs = block size
count = the no of blocks to be written
Adjust these parameters to change the size of the file written as per your server specs and the amount of time you want to spend writing.
the read speed as suggested already by gtsouk, can be checked by using /dev/null as output.
dd if=/dev/sda of=/dev/null
Let this run for a few minutes and stop it with ctrl+C. It will print the read transfer speed of your drive/controller. This is the maximum read speed you can get out of your drive.
sysbench
See http://www.howtoforge.com/how-to-benchmark-your-system-cpu-file-io-mysql-with-sysbench
Example
sysbench --test=fileio --file-total-size=150G prepare
sysbench --test=fileio --file-total-size=150G --file-test-mode=rndrw --init-rng=on --max-time=300 --max-requests=0 run
It can also test cpu, memory, threads, and database server performance,
It's awesome.
Or testing software written in java: http://www.dacapobench.org/
you need to specify what you're testing for, otherwise benchmarks will only mislead. There are different aspects of IO performance that you need to chose to optimize for, and different parameters to play with.
Your system parameters:
storage device: HDD, SSD (which?), Raid (which?)
filesystem, block size, journal mode
file cache, dirty thresholds, amount of memory
IO scheduler, its tunables
number of CPUs
kernel version
Your test parameters:
read or write performance?
sequential or random?
1 thread or multiple?
size of requests
optimize for throughput or request delay?
There is an excellent program to test block storage IO on Unix called IORATE. You can get a copy at iorate.org.
It can generate complex mixed IO, including re-use (hits) and hot zones for tiered storage testing.
Take a look at IOzone:
http://www.iozone.org/
If you would like to read a whitepaper illustrating real-world usage on an HPC cluster, please see this pdf, page 36:
http://i.dell.com/sites/content/business/solutions/hpcc/en/Documents/Dell-NSS-NFS-Storage-solution-final.pdf

Using "top" in Linux as semi-permanent instrumentation

I'm trying to find the best way to use 'top' as semi-permanent instrumentation in the development of a box running embedded Linux. (The instrumentation will be removed from the final-test and production releases.)
My first pass is to simply add this to init.d:
top -b -d 15 >/tmp/toploop.out &
This runs top in "batch" mode every 15 seconds. Let's assume that /tmp has plenty of spaceā€¦
Questions:
Is 15 seconds a good value to choose for general-purpose monitoring?
Other than disk space, how seriously is this perturbing the state of the system?
What other (perhaps better) tools could be used like this?
Look at collectd. It's a very light weight system monitoring framework coded for performance.
We use sysstat to monitor things like this.
You might find that vmstat and iostat with a delay and no repeat counter is a better option.
I suspect 15 seconds would be more than adequate unless you actually want to watch what's happening in real time, but that doesn't appear to be the case here.
As far as load, on an idling PIII 900Mhz w/ 768MB of RAM running Ubuntu (not sure which version, but not more than a year old) I have top updating every 0.5 seconds and it's about 2% CPU utilization. At 15s updates, I'm seeing 0.1% CPU utilization.
depending upon what exactly you want, you could use the output of uptime, free, and ps to get most, if not all, of top's information.
If you are looking for overall load, uptime is probably sufficient. However, if you want specific information about processes, you are adventurous, and have the /proc filessystem enabled, you may want to write your own tools. The primary benefit in this environment is that you can focus on exactly what you want and minimize the load introduced to the system.
The proc file system gives your application read access to the kernel memory that keeps track of many of the interesting variables. Reading from /proc is one of the lightest ways to get this information. Additionally, you may be able to get more information than provided by top. I've done this in the past to get amount of time spent in user and system by this process. Additionally, you can use this to get information about the number of file descriptors open by the process. You might also use this to get detailed information about how the network system is working.
Much of this information is pre-processed by other applications which can be used if you get the information you need. However, it is rather straight-forward to read the raw information. Do a man proc for more information.
Pity you haven't said what you are monitoring for.
You should decide whether 15 seconds is ok or not. Feel free to drop it way lower if you wish (and have a fast HDD)
No worries unless you are running a soft real-time system
Have a look at tools suggested in other answers. I'll add another sugestion: "iotop", for answering a "who is thrashing the HDD" questions.
At work for system monitoring during stress tests we use a tool called nmon.
What I love about nmon is it has the ability to export to XLS and generate beautiful graphs for you.
It generates statistics for:
Memory Usage
CPU Usage
Network Usage
Disk I/O
Good luck :)

Resources