I want to run dd over a SanDisk 32GB micro SD but I'm not sure how to decide on the block size.
Usually I use bs=1M, but could I go any higher than that?
Try it!
#!/bin/bash
bs=( 32k 64k 128k 256k 512k 1m 2m 4m )
ct=( 32768 16384 8192 4096 2048 1024 512 256 )
for (( x=0;x<${#bs[#]};x++ )); do
echo Testing bs=${bs[x]},count=${ct[x]}
dd if=/dev/zero bs=${bs[x]} count=${ct[x]} of=junk
done
Output
Testing bs=32k,count=32768
32768+0 records in
32768+0 records out
1073741824 bytes transferred in 3.094462 secs (346988217 bytes/sec)
Testing bs=64k,count=16384
16384+0 records in
16384+0 records out
1073741824 bytes transferred in 3.445761 secs (311612394 bytes/sec)
Testing bs=128k,count=8192
8192+0 records in
8192+0 records out
1073741824 bytes transferred in 2.937460 secs (365534116 bytes/sec)
Testing bs=256k,count=4096
4096+0 records in
4096+0 records out
1073741824 bytes transferred in 3.247829 secs (330602946 bytes/sec)
Testing bs=512k,count=2048
2048+0 records in
2048+0 records out
1073741824 bytes transferred in 3.212303 secs (334259206 bytes/sec)
Testing bs=1m,count=1024
1024+0 records in
1024+0 records out
1073741824 bytes transferred in 3.129765 secs (343074260 bytes/sec)
Testing bs=2m,count=512
512+0 records in
512+0 records out
1073741824 bytes transferred in 2.908048 secs (369231132 bytes/sec)
Testing bs=4m,count=256
256+0 records in
256+0 records out
1073741824 bytes transferred in 2.996609 secs (358318964 bytes/sec)
You could go higher, but it probably won't make any difference. If you go too high, things might actually slow down.
Different SSD devices have different performance profiles. There is no universal, ultimate, answer that's right for every SSD device that exists in this entire world.
The only way to get the right answer is to experiment with various block sizes, and benchmark the performance.
Related
We're running a standard B8ms VM with a 257GB Premium SSD. According to the docs it says the throughput should be Up to 170 MB/second Provisioned 100 MB/second
https://azure.microsoft.com/en-us/pricing/details/managed-disks/
However when i test it, the throughput looks to be about 35 MB/Second
▶ dd if=/dev/zero of=/tmp/test1.img bs=1G count=1 oflag=dsync
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 30.8976 s, 34.8 MB/s
Is there something else i need to account for in order to maximize the throughput?
You have diffrent limits, you have the IOPS limit on the disk and you have the Throughput limit on the disk. If you use bigger blocks when testing you will hit the througput limit and if you use smaller blocks you will hit the IOPS limit.
Then you have the VM limits and you have the Disk/Storage limits. So there is many things to take into consideration when doing this types of tests.
And you also have the caching settings to take into consideration on the disks.
https://learn.microsoft.com/en-us/azure/virtual-machines/windows/disks-benchmarks
I am running 5 node apache cassandra cluster(3.11.4), given 48 GB RAM , 12 GB heap memory and 6 vcpus per each node. I can see a lot of load (18 GB)on the cassandra server nodes even when there is no data processing in cassandra servers.I can a lot of GC pauses, because of which I can see "NoHostAvailable" exceptions when I try to push data to cassandra.
Please suggest me how to reduce this load and how can I avoid connection failures "NoHostAvailable".
ID : a65c8072-636a-480d-8774-2c5704361bec
Gossip active : true
Thrift active : true
Native Transport active: true
Load : 18.07 GiB
Generation No : 1576158587
Uptime (seconds) : 205965
Heap Memory (MB) : 3729.16 / 11980.81
Off Heap Memory (MB) : 12.81
Data Center : dc1
Rack : rack1
Exceptions : 21
Key Cache : entries 2704, size 5.59 MiB, capacity 100 MiB, 1966 hits, 4715 requests, 0.417 recent hit rate, 14400 save period in seconds
Row Cache : entries 0, size 0 bytes, capacity 0 bytes, 0 hits, 0 requests, NaN recent hit
rate, 0 save period in seconds
Counter Cache : entries 0, size 0 bytes, capacity 50 MiB, 0 hits, 0 requests, NaN recent hit rate, 7200 save period in seconds
Chunk Cache : entries 25, size 1.56 MiB, capacity 480 MiB, 4207149 misses, 4342386 requests, 0.031 recent hit rate, NaN microseconds miss latency
Percent Repaired : 34.58708788430304%
Token : (invoke with -T/--tokens to see all 256 tokens)
If you have 48Gb RAM I recommend to get to at least cheap of 16Gb or 20. Make sure that you are using G1 GC (default in Java 8).
But NoHostAvailable may depend on the consistency level that you are using, and other factors..
On other side, you may consider to throttle your application - sometimes pushing slower may lead to better throughput.
I have a vhd (disk) of size 500 MB, of which only 10 MB of data is written, followed by empty chunks and finally one more block of 10 MB towards the end.
So , the total data present is just 20 MB out of 500 MB.
I am trying to find a utility in node.js , to find out the number of data bytes, not succeeded though.
There is a function fs.fstatSync(file).size, which gives the total size.
Is there any utility/functions to calculate the data written?
You will probably have to use require("child_process") to access some system utility cli.
I'm trying to pipe extremely high speed data from one application to another using 64-bit CentOS6. I have done the following benchmarks using dd to discover that the pipes are holding me back and not the algorithm in my program. My goal is to achieve somewhere around 1.5 GB/s.
First, without pipes:
dd if=/dev/zero of=/dev/null bs=8M count=1000
1000+0 records in
1000+0 records out
8388608000 bytes (8.4 GB) copied, 0.41925 s, 20.0 GB/s
Next, a pipe between two dd processes:
dd if=/dev/zero bs=8M count=1000 | dd of=/dev/null bs=8M
1000+0 records in
1000+0 records out
8388608000 bytes (8.4 GB) copied, 9.39205 s, 893 MB/s
Are there any tweaks I can make to the kernel or anything else that will improve performance of running data through a pipe? I have tried named pipes as well, and gotten similar results.
Have you tried with smaller blocks?
When I try on my own workstation I note successive improvement when lowering the block size.
It is only in the realm of 10% in my test, but still an improvement. You are looking for 100%.
As it turns out testing further, really small block sizes seem to do the trick:
I tried
dd if=/dev/zero bs=32k count=256000 | dd of=/dev/null bs=32k
256000+0 records in
256000+0 records out
256000+0 records in
256000+0 records out
8388608000 bytes (8.4 GB) copied8388608000 bytes (8.4 GB) copied, 1.67965 s, 5.0 GB/s
, 1.68052 s, 5.0 GB/s
And with your original
dd if=/dev/zero bs=8M count=1000 | dd of=/dev/null bs=8M
1000+0 records in
1000+0 records out
1000+0 records in
1000+0 records out
8388608000 bytes (8.4 GB) copied8388608000 bytes (8.4 GB) copied, 6.25782 s, 1.3 GB/s
, 6.25203 s, 1.3 GB/s
5.0/1.3 = 3.8 so that is a sizable factor.
It seems that Linux pipes only yield up 4096 bytes at a time to the reader, regardless of how large the writer's writes were.
So trying to stuff more than 4096 bytes into a already stuffed pipe per write(2) system call will just cause the writer to stall, until the reader can invoke the multiple reads needed to pull that much data out of the pipe and do whatever processing it has in mind to do.
This tells me that on multi-core or multi-thread CPU's (does anyone still make a single core, single thread, CPU?), one can get more parallelism and hence shorter elapsed clock times by having each writer in a pipeline only write 4096 bytes at a time, before going back to whatever data processing or production it can do towards making the next 4096 block.
i want to know what is the advantage of writing file block by block.i can think it will reduce the io operation. but in linux like environment data anyway goes to the page cache and background daemon doing the physical disk writing(correct me if i'm wrong).In that kind of environment what are the advantages of block writing?.
If I understand your question correctly, you are asking about the advantages of using larger blocks, rather than writing character-by-character.
You have to consider that each use of a system call (e.g. write()) has a minimum cost by itself, regardless of what is being done. In addition it may cause the calling process to be subjected to a context switch, which has a cost of its own and also allows other processes to use the CPU, causing even more significant delays.
Therefore - even if we forget about direct and synchronous I/O modes where each operation may make it to the disk immediately - it makes sense from a performance standpoint to reduce the impact of those constant costs by moving around larger blocks of data.
A simple demonstration using dd to transfer 1,000,000 bytes:
$ dd if=/dev/zero of=test.txt count=1000000 bs=1 # 1,000,000 blocks of 1 byte
1000000+0 records in
1000000+0 records out
1000000 bytes (1.0 MB) copied, 1.55779 s, 642 kB/s
$ dd if=/dev/zero of=test.txt count=100000 bs=10 # 100,000 blocks of 10 bytes
100000+0 records in
100000+0 records out
1000000 bytes (1.0 MB) copied, 0.172038 s, 5.8 MB/s
$ dd if=/dev/zero of=test.txt count=10000 bs=100 # 10,000 blocks of 100 bytes
10000+0 records in
10000+0 records out
1000000 bytes (1.0 MB) copied, 0.0262843 s, 38.0 MB/s
$ dd if=/dev/zero of=test.txt count=1000 bs=1000 # 1,000 blocks of 1,000 bytes
1000+0 records in
1000+0 records out
1000000 bytes (1.0 MB) copied, 0.0253754 s, 39.4 MB/s
$ dd if=/dev/zero of=test.txt count=100 bs=10000 # 100 blocks of 10,000 bytes
100+0 records in
100+0 records out
1000000 bytes (1.0 MB) copied, 0.00919108 s, 109 MB/s
As an additional benefit, using larger-blocks of data lets both the I/O scheduler and the allocator of the filesystem to make more accurate estimates about your actual workload.