Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I learned that harddisk data is transferred to main memory using DMA, but network stack data cannot use DMA and the data has to go through processor. Is it true? If yes, what are the ways to avoid this? Isn't it really inefficient to transfer data through processor?
Most modern network cards, or any hardware for that matter, also use DMA for data transfer, however the confusion stems from the fact that the CPU will have to process most of the data coming from user applications into data that the network card expects (for example, in the form of TCP packets and in ethernet frames). This processing has to be done by the CPU, since the CPU implements the various network protocols used to send data.
Incidentally, the same can be said of hard drives. Though DMA is used for transferring large blocks of data from RAM to the hard disk, almost inevitably the CPU must verify that these blocks of data will be placed at the correct location and formatted to the correct filesystem type.
Is it true?
No! Network devices DMA into memory buffers specifically allocated for this purpose. DMA for network IO has been the general rule in the x86 world since the early 1990's when the PCI bus emerged.
Isn't it really inefficient to transfer data through processor?
Yes, incredibly inefficient. After initialization, the only time a core interacts directly with a modern network card is to signal the "transmit doorbell". This doorbell is a lone write operation which tells the card to look into memory for new packets to transmit. All other interactions between core and network device take place indirectly via memory.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 7 years ago.
Improve this question
I found at several places that Linux uses pages and a paging mechanism but I didn't find anywhere where this file is or how to configure it.
All the information I found is about the Linux swap file / partition. There is a difference between paging and swapping:
Paging moves pages (a small frame which contains a piece of data - usually 4 KB but can vary between different OS's) from main memory to a backbend storage, happens always as a normal function of the operating system.
Swapping moves an entire process to storage and happens when the system is memory stressed or on windows 8 when a new application is hibernating.
Does Linux uses it's swap file / partition for both cases?
If so, how could I see how many page are currently paged out? This information is not there in vmstat, free or swapon commands (or that I fail to see it).
Or is there another file used for paging?
If so, how can I configure it (and watch it's usage)?
Or perhaps Linux does not use paging at all and I was mislead?
I would appreciate if the answers will be specific to red hat enterprise Linux both versions 6 and 7 but also a general answer about all Linux's will be good.
Thanks in advance.
On Linux, the swap partition(s) are used for paging.
Linux does not respond to memory pressure by swapping out whole processes. The virtual memory system does demand paging, page by page. Under extreme memory pressure, one or more processes will be killed by the OOM killer. (There are some useful links to documentation in the first NOTE in man malloc)
There is a line in the top header which shows swap partition usage, but if that is all the information you want, use
swapon -s
man swapon for more information.
The swap partition usage is not the same as the number of unmapped pages. A page might be memory-mapped to a file using the mmap call; since that page has backing store in the file, there is no need to also write it to a swap partition, and the system won't use swap space for that. But swap partition usage is a pretty good indicator.
Also note that Linux (unlike Windows) does not allocate swap space for pages when they are allocated. Instead, it adds the new page to the virtual memory map without any backing store. and allocates the swap space when the page needs to be swapped out. The consequence (as described in the malloc manpage referenced earlier) is that a malloc call may succeed in allocating virtual memory, but a subsequent attempt to use that virtual memory may fail.
Although Linux retains the term 'swap partition' as a historical relic, it actually performs paging. So your expectation is borne out; you were just thrown by the archaic terminology.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
How the OS is able to do this
With virtual memory, programs running on the system can allocate far
more memory than is physically available;
It is in practice "a little more memory", not "far more memory", otherwise you are experimenting thrashing.
Every desktop, latop or server processor has an MMU. It is used by the virtual memory system to give a virtual address space thru paging & the page cache. When the kernel gets a page fault, it could fetch a page from disk -e.g. in a segment of an ELF executable or shared object or some other mapped file, or some pages from the swap area- or send a SIGSEGV signal, see signal(7).
On Linux, several system calls can change the address space: mmap(2) and munmap (and also the obsolete sbrk, etc...) and execve(2). You might advise the kernel using madvise(2)
You could use cat /proc/$somepid/maps (e.g. cat /proc/$$/maps in your shell) to understand the address space map of some process. See proc(5).
Follow all the links above and read also Advanced Linux Programming and Operating Systems: Three Easy Pieces
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
What happens when physical memory is fully occupied by process and a new process(similar priority) is introduced. How does the Memory Management unit handle the pages(resource) requested by the new and old processes(same priority tasks).
So I mean to ask how swapping of memory done for similar priority process and physical memory is full on the other side. Please explain with an example?
You should not care about what happenning in that case, and on current Linux deskops & laptops it is an improbable case (because usually the kernel steals page from the filesystem cache).
When a new program is started with the execve(2) syscalls, new memory mappings are set up (as if nearly done by mmap(2)), possibly with copy-on-write mechanism. Once the program is accessing them, the kernel will page-fault and ultimately load the page in physical RAM. It may have to choose which pages should be stealed. If they are dirty, it has to write them to some swap zone (or to some mmap-ed file if the mapping is MAP_SHARED). Otherwise, it just reuses them (and reassign the physical pages).
If all memory resources are used, memory overcommit may happen
The MMU is used by the linux kernel for virtual memory management. Applications see on some virtual address space (look into /proc/ e.g. with cat /proc/self/maps to understand it).
The MMU is doing the virtual to physical address translation and is giving page faults. The kernel is responsible for configuring the MMU (i.e. setting up the virtual address space translation mechanism) and for handling page faults (which are usually invisible to the application -e.g. because the kernel would fetch a page from the disk, the filesystem or the swap area-, except as the SIGSEGV signal which occurs when a "non-existent" page is accessed).
please take time to read all the links given here.
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 12 years ago.
Improve this question
This may be slightly OT, but I was wondering why having a process which heavily uses IO (say cp big file from one location to the other on the same disk) slows everything down, even processes which are mostly CPU bound. I noticed that on both OS I heavily use (mac os x and linux).
In particular, I wonder why multi-core does not really help here: is it a hardware limitation for commodity hardware (disk controller, etc...), an os limitation, or is there something inherently hard into allocating the right resources (scheduling) ?
It could be a limitation of the current scheduler. Google "Galbraith's sched:autogroup patch" or "linux miracle patch" (yes really!). There's apparently a 200-line patch in the process of being refined and merged which adds group scheduling, about which Linus says:
I'm also very happy with just what it does to interactive performance.
Admittedly, my "testcase" is really trivial (reading email in a
web-browser, scrolling around a bit, while doing a "make -j64" on the
kernel at the same time), but it's a test-case that is very relevant
for me. And it is a huge improvement.
Before-and-after videos here.
Because, copying a large file (bigger than the available buffer cache) usually involves bringing it through the buffer cache, which generally causes less recently-used pages to be thrown out, which must then be brought back in.
Other processes which are doing tiny small amounts of occasional IO (say just stat'ing a directory) then get their caches all blown away and must do physical reads to bring those pages back in.
Hopefully this can get fixed by a copy-command which can detect this kind of thing and advise the kernel accordingly (e.g. with posix_fadvise) so that a large one-off bulk transfer of a file which does not need to be subsequently read does not completely discard all clean pages from the buffer cache, which now normally mostly happens.
A high rate of IO operations usually means a high rate of interrupts that must be serviced by the CPU, which takes CPU time.
In the case of cp, it also uses a considerable amount of the available memory bandwidth, as each block of data is copied to and from userspace. This will also tend to eject data required by other processes from the CPUs caches and TLB, which will slow down other processes as they take cache misses.
Also, would you know a way to validate your hypothesis on linux, e.g. number of interrupts while doing IO intensive operations.
To do with interrupts, I'm guessing that caf's hypothesis is:
many interrupts per second;
interrupts are serviced by any/all CPUs;
therefore, interrupts flush the CPU caches.
The statistics you'd need to test that would be the number of interrupts per second per CPU.
I don't know whether it's possible to tie interrupts to a single CPU: see http://www.google.com/#q=cpu+affinity+interrupt for further details.
Here's something I don't understand (this is the first time I've looked at this question): perfmon on my laptop (running Windows Vista) is showing 2000 interrupts/second (1000 on each core) when it's almost idle (doing nothing but displaying perfmon). I can't imagine which device is generating 2000 interrupts/second, and I would have thought that's enough to blow away the CPU caches (my guess is that the CPU quantum for a busy thread is something like 50 msec). It's also showing an average of 350 DPCs/sec.
Do high end hardware suffer from similar issues ?
One type of hardware difference might be the disk hardware and disk device driver, generating more or fewer interrupts and/or other contentions.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 3 years ago.
Improve this question
I see http://www.youtube.com/watch?v=96dWOEa4Djs from http://www.joelonsoftware.com/items/2009/03/27.html and get amazed of the improvenment.
I have a good workstation (Sun Ultra M4, 2 AMD Opteron, 8GB RAM, NVidia FX 1500) and feel as fast as... any other computer in the city (except when rendering).
I blame windows for it (I can't use linux because run 3d max) but now I wonder if is possible improve the I/O.
I run VM (1-3 per time), 3D Max, Photoshop and Python... plus some video encoding and stuff like that.
I have not enough money to buy a SSD and have 2 SATA drivers. What I can do? Is possible mount on windows a RAM drive? How do I use it?
Have you thought about using a RAID array? You can get some decent I/O improvements from a RAID-0 configuration..
although I must ask - are you sure your bottleneck is disk I/O and not memory or CPU? In my experience disk I/O has traditionally been the last bottleneck on a machine (especially in large scale machines) and more often than not memory, poor use of pagefiles and CPU throughput have been the tension points.
Sounds like you're probably CPU bound. All the programs you listed depend highly on memory and CPU rather than disk speed. Since it looks like you have plenty of memory I'm guessing it's mostly CPU slowing you down.
If you really do wish to improve your disk performance without spending much money you can try putting your disks on a Raid 0 setup. This will make your computer treat them like one large data storage volume and speed things up by reading from both disks simultaneously. Keep in mind that this also increases the likelihood that you will lose data since the rail volume could become corrupt or one of the disks could fail (causing data on both disks to be lost).
Alternately, you can try buying a faster disk drive. Newegg sells Western Digital Raptor drives (Currently the fastest SATA non-SSD disks available) from between $100-$150 (after rebate) http://www.newegg.com/Product/ProductList.aspx?Submit=ENE&N=40000014&Description=raptor&name=Internal%20Hard%20Drives. These could give you a 20-30%+ boost in disk IO depending on how good your current drives are.
Go for UltraSCSI if you want more disk I/O bandwidth. But do not meter your disk speed looking at how fast programs are loading. Better disk subsystems and /or configurations (such as RAID) are only useful at transferring large data blocks, e.g video/audio editing ,not loading operating system files or application executables.
Did you scanned your computer for spyware? ;)