Cheap way to improve I/O [closed]

Cheap way to improve I/O [closed] - io

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I see http://www.youtube.com/watch?v=96dWOEa4Djs from http://www.joelonsoftware.com/items/2009/03/27.html and get amazed of the improvenment.
I have a good workstation (Sun Ultra M4, 2 AMD Opteron, 8GB RAM, NVidia FX 1500) and feel as fast as... any other computer in the city (except when rendering).
I blame windows for it (I can't use linux because run 3d max) but now I wonder if is possible improve the I/O.
I run VM (1-3 per time), 3D Max, Photoshop and Python... plus some video encoding and stuff like that.
I have not enough money to buy a SSD and have 2 SATA drivers. What I can do? Is possible mount on windows a RAM drive? How do I use it?

Have you thought about using a RAID array? You can get some decent I/O improvements from a RAID-0 configuration..
although I must ask - are you sure your bottleneck is disk I/O and not memory or CPU? In my experience disk I/O has traditionally been the last bottleneck on a machine (especially in large scale machines) and more often than not memory, poor use of pagefiles and CPU throughput have been the tension points.

Sounds like you're probably CPU bound. All the programs you listed depend highly on memory and CPU rather than disk speed. Since it looks like you have plenty of memory I'm guessing it's mostly CPU slowing you down.
If you really do wish to improve your disk performance without spending much money you can try putting your disks on a Raid 0 setup. This will make your computer treat them like one large data storage volume and speed things up by reading from both disks simultaneously. Keep in mind that this also increases the likelihood that you will lose data since the rail volume could become corrupt or one of the disks could fail (causing data on both disks to be lost).
Alternately, you can try buying a faster disk drive. Newegg sells Western Digital Raptor drives (Currently the fastest SATA non-SSD disks available) from between $100-$150 (after rebate) http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=40000014&Description=raptor&name=Internal%20Hard%20Drives. These could give you a 20-30%+ boost in disk IO depending on how good your current drives are.

Go for UltraSCSI if you want more disk I/O bandwidth. But do not meter your disk speed looking at how fast programs are loading. Better disk subsystems and /or configurations (such as RAID) are only useful at transferring large data blocks, e.g video/audio editing ,not loading operating system files or application executables.
Did you scanned your computer for spyware? ;)

Related

Does Linux have a page file? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I found at several places that Linux uses pages and a paging mechanism but I didn't find anywhere where this file is or how to configure it.
All the information I found is about the Linux swap file / partition. There is a difference between paging and swapping:
Paging moves pages (a small frame which contains a piece of data - usually 4 KB but can vary between different OS's) from main memory to a backbend storage, happens always as a normal function of the operating system.
Swapping moves an entire process to storage and happens when the system is memory stressed or on windows 8 when a new application is hibernating.
Does Linux uses it's swap file / partition for both cases?
If so, how could I see how many page are currently paged out? This information is not there in vmstat, free or swapon commands (or that I fail to see it).
Or is there another file used for paging?
If so, how can I configure it (and watch it's usage)?
Or perhaps Linux does not use paging at all and I was mislead?
I would appreciate if the answers will be specific to red hat enterprise Linux both versions 6 and 7 but also a general answer about all Linux's will be good.
Thanks in advance.

On Linux, the swap partition(s) are used for paging.
Linux does not respond to memory pressure by swapping out whole processes. The virtual memory system does demand paging, page by page. Under extreme memory pressure, one or more processes will be killed by the OOM killer. (There are some useful links to documentation in the first NOTE in man malloc)
There is a line in the top header which shows swap partition usage, but if that is all the information you want, use
swapon -s
man swapon for more information.
The swap partition usage is not the same as the number of unmapped pages. A page might be memory-mapped to a file using the mmap call; since that page has backing store in the file, there is no need to also write it to a swap partition, and the system won't use swap space for that. But swap partition usage is a pretty good indicator.
Also note that Linux (unlike Windows) does not allocate swap space for pages when they are allocated. Instead, it adds the new page to the virtual memory map without any backing store. and allocates the swap space when the page needs to be swapped out. The consequence (as described in the malloc manpage referenced earlier) is that a malloc call may succeed in allocating virtual memory, but a subsequent attempt to use that virtual memory may fail.

Although Linux retains the term 'swap partition' as a historical relic, it actually performs paging. So your expectation is borne out; you were just thrown by the archaic terminology.

btrfs raid1 with multiple devices [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
I have 6 devices: 4TB, 3TB, 2TB, 2TB, 1.5TB, 1TB (/dev/sda to /dev/sdf).
First question:
With RAID-1 I'd have:
2TB mirrored in 2TB
1TB mirrored in 0.5#4TB + 0.5#3TB
1.5TB mirrored in 1.25#4TB + 0.25#3TB
the rest 2.25 of 3TB mirrored in the rest 2.25TB of 4TB.
My total size would be in that case (4 + 3 + 2 + 2 + 1.5 + 1) = 13.5/2 = 6.75TB
Will $ mkfs.btrfs --data raid1 --metadata raid1 /dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf provide me with approximately 6.75TB? If yes, how many disks (and which?) can I afford losing?
Second question:
With the RAID-1 I can afford, for example, losing three disks:
one 2TB disk,
the 1TB disk and
the 1.5TB disk,
without losing data.
How can I have the same freedom in losing the same disks with btrfs?
Thanks!

Btrfs distributes the data (and its RAID 1 copies) block-wise, thus deals very well with hard disks of different size. You will receive the sum of all hard disks, divided by two – and do not need to think on how to put them together in similar sized pairs.
If more than one disk fails, you're always in danger of losing data: RAID 1 cannot deal with losing two disks at the same time. In your example given above, if the wrong two disks die, you always lose data.
Btrfs can increase the chances of losing data if more than one disk fails, as it will distribute the blocks somewhat randomly, chances are higher that some blocks are only stored on the failed two devices. On the other hand, if you lose data, you probably lose less for the same reason. In average, it sums up to the same chance of losing n bits, but if you're interested in the chance of losing only a single bit you're worse of with btrfs.
And then again, you should also consider its advantage of using checksums which help against corrupted data on disk.

Does network stack on my computer use DMA? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I learned that harddisk data is transferred to main memory using DMA, but network stack data cannot use DMA and the data has to go through processor. Is it true? If yes, what are the ways to avoid this? Isn't it really inefficient to transfer data through processor?

Most modern network cards, or any hardware for that matter, also use DMA for data transfer, however the confusion stems from the fact that the CPU will have to process most of the data coming from user applications into data that the network card expects (for example, in the form of TCP packets and in ethernet frames). This processing has to be done by the CPU, since the CPU implements the various network protocols used to send data.
Incidentally, the same can be said of hard drives. Though DMA is used for transferring large blocks of data from RAM to the hard disk, almost inevitably the CPU must verify that these blocks of data will be placed at the correct location and formatted to the correct filesystem type.

Is it true?
No! Network devices DMA into memory buffers specifically allocated for this purpose. DMA for network IO has been the general rule in the x86 world since the early 1990's when the PCI bus emerged.
Isn't it really inefficient to transfer data through processor?
Yes, incredibly inefficient. After initialization, the only time a core interacts directly with a modern network card is to signal the "transmit doorbell". This doorbell is a lone write operation which tells the card to look into memory for new packets to transmit. All other interactions between core and network device take place indirectly via memory.

Why high IO rate operations slow everything on linux? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
This may be slightly OT, but I was wondering why having a process which heavily uses IO (say cp big file from one location to the other on the same disk) slows everything down, even processes which are mostly CPU bound. I noticed that on both OS I heavily use (mac os x and linux).
In particular, I wonder why multi-core does not really help here: is it a hardware limitation for commodity hardware (disk controller, etc...), an os limitation, or is there something inherently hard into allocating the right resources (scheduling) ?

It could be a limitation of the current scheduler. Google "Galbraith's sched:autogroup patch" or "linux miracle patch" (yes really!). There's apparently a 200-line patch in the process of being refined and merged which adds group scheduling, about which Linus says:
I'm also very happy with just what it does to interactive performance.
Admittedly, my "testcase" is really trivial (reading email in a
web-browser, scrolling around a bit, while doing a "make -j64" on the
kernel at the same time), but it's a test-case that is very relevant
for me. And it is a huge improvement.
Before-and-after videos here.

Because, copying a large file (bigger than the available buffer cache) usually involves bringing it through the buffer cache, which generally causes less recently-used pages to be thrown out, which must then be brought back in.
Other processes which are doing tiny small amounts of occasional IO (say just stat'ing a directory) then get their caches all blown away and must do physical reads to bring those pages back in.
Hopefully this can get fixed by a copy-command which can detect this kind of thing and advise the kernel accordingly (e.g. with posix_fadvise) so that a large one-off bulk transfer of a file which does not need to be subsequently read does not completely discard all clean pages from the buffer cache, which now normally mostly happens.

A high rate of IO operations usually means a high rate of interrupts that must be serviced by the CPU, which takes CPU time.
In the case of cp, it also uses a considerable amount of the available memory bandwidth, as each block of data is copied to and from userspace. This will also tend to eject data required by other processes from the CPUs caches and TLB, which will slow down other processes as they take cache misses.

Also, would you know a way to validate your hypothesis on linux, e.g. number of interrupts while doing IO intensive operations.
To do with interrupts, I'm guessing that caf's hypothesis is:
many interrupts per second;
interrupts are serviced by any/all CPUs;
therefore, interrupts flush the CPU caches.
The statistics you'd need to test that would be the number of interrupts per second per CPU.
I don't know whether it's possible to tie interrupts to a single CPU: see http://www.google.com/#q=cpu+affinity+interrupt for further details.
Here's something I don't understand (this is the first time I've looked at this question): perfmon on my laptop (running Windows Vista) is showing 2000 interrupts/second (1000 on each core) when it's almost idle (doing nothing but displaying perfmon). I can't imagine which device is generating 2000 interrupts/second, and I would have thought that's enough to blow away the CPU caches (my guess is that the CPU quantum for a busy thread is something like 50 msec). It's also showing an average of 350 DPCs/sec.
Do high end hardware suffer from similar issues ?
One type of hardware difference might be the disk hardware and disk device driver, generating more or fewer interrupts and/or other contentions.

What is the best way to partition terabyte drive in a linux development machine? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have a new 1 TB drive coming in tomorrow. What is the best way to divide this space for a development workstation?
The biggest problem I think I'm going to have is that some partitions (probably /usr) will become to small after a bit of use. Other partitions are probably to huge. The swap drive for example is currently 2GB (2x 1GB RAM), but it is almost never used (only once that I know of).

If you partition your drive using LVM you won't have to worry about any individual partition running out of space in the future. Just move space around as necessary.

My standard strategy for normal "utility" boxes is to give them a swap partition twice the size of their RAM, a 1GB /boot partition and leave the rest as one vast partition. Whilst I see why some people want a separate /var, separate /home, etc., if I only have trusted users and I'm not running some production service, I don't think the reasons I've heard to date apply. Instead, I do my best to avoid any resizing, or any partition becoming too small - which is best achieved with one huge partition.
As for the size of swap and /boot - if your machine has 4GB memory, you may not want to have double that in swap. It's nonetheless wise to at least have some. Even if you nonetheless have double, you're using a total of 9GB, for 0.9% of your new drive. /boot can be smaller than 1GB, this is just my standard "will not become full, ever" size.

If you want a classic setup, I'd go for a 50GB "/" partition, for all your application goodness, and split the rest across users, or a full 950GB for a single user. Endless diskspace galore!

#wvdschel:
Don't create separate partitions for each user. Unused space on each partition is wasted.
Instead create one partition for all users. Use quota if necessary to limit each user's space. It's much more flexible than partitioning or LVM.
OTOH, one huge partition is usually a bit slower, depending on the file system.

I always setup LVM on Linux, and use the following layout to start with:
/ = 10GB
swap = 4GB
/boot = 100MB
/var = 5GB
/home = 10GB OR remainder of drive.
And then, later on if I need more space, I could simply increase /home, /var or / as needed. Since I work a lot with XEN Virtual Machines, I tend to leave the remaining space open so that I can quickly create LVM volumes for the XEN virtual machines.

Did you know 1TB can easily take up to half an hour to fsck? Workstations usually crash and reboot more often than servers, so that can get quite annoying. Do you really need all that space?

I would go with a 1 GB for /boot, 100 GB for /, and the rest for /home. 1 GB is probably too high for /boot, but it's not like you'll miss it. 100 GB might seem like a lot for everything outside home, until you start messing around with Databases and realize that MySQL keeps databases in /var. Best to leave some room to grow in that area. The reason that I recommend using a separtate partition for /home, is that when you want to completely switch distros, or if the upgrade option on your distro of choice, for whatever reason doesn't work, or if you just want to start from scratch and do a clean system install, you can just format / and /boot, and leave home with all the user data intact.

I would have two partitions. A small one (~20 GB) mounted on / would store all your programs, and then have a large one on /home. Many people have mentioned a partition for /boot but that is not really necessary. If you are worried about resizing, use LVM.

i give 40gb to / then how ever much ram i have i give the same to /swap then the rest to /home

Please tell me what are you doing to /boot that you need more than 64MB on it? Unless you never intend to clean it, anything more is a waste of space. Kernel image + initrd + System.map won't take more than 10MB (probably less - mine weight 5MB) and you really don't need to keep more than two spares.
And with the current prices of RAM - if you are needing swap, you'll be much better off buying more memory. Reserve 1GB for swap and have something monitoring it's usage (no swap at all is bad idea because the machine might lock up when it runs out of free memory).

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string