High CPU usage - simple packet receiver on Linux - linux

I'm writing simple application under Linux that gathers all packets from network. I'm using blocking receiving by calling "recvfrom()" function. When I generate big network load with hping3 (~100k raw frames per second, 130 bytes each) "top" tool shows high CPU usage for my process - it is about 37-38%. It is big value for me. When I decrease number of packets, usage is lower - for example top shows 3% for 4k frames per second.
I've check DC++ when it downloads ~10MB/s and its process doesn't use 38% of CPU but 5%. Is there any programmable way in C to reduce CPU usage and still receive a lot of frames?
My CPU:
Intel i5-2400 CPU # 3.10Ghz
My system:
Ubuntu 11.04 kernel 3.6.6 with PREEMPT-RT patch
And here is my code:
#include <stdlib.h>
#include <stdio.h>
#include <sys/mman.h>
#include <string.h>
#include <sys/socket.h>
#include <linux/if_packet.h>
#include <linux/if_ether.h>
#include <linux/if_arp.h>
#include <arpa/inet.h>
/* Socket descriptor. */
int mainSocket;
/* Buffer for frame. */
unsigned char* buffer;
int main(int argc, char* argv[])
{
/** Create socket. **/
mainSocket = socket(AF_PACKET, SOCK_RAW, htons(ETH_P_ALL));
if (mainSocket == -1) {
printf("Error: cannot create socket!\n");
}
/** Create buffer for frame **/
buffer = malloc(ETH_FRAME_LEN);
printf("Listing...");
while(1) {
// Length of received packet
int length = recvfrom(mainSocket, buffer, ETH_FRAME_LEN, 0, NULL, NULL);
if(length > 0)
{
// ... do something ...
}
}

I don't know if this will help, but looking on Google I see that:
Raw socket, Packet socket and Zero copy networking in Linux as well as http://lxr.linux.no/linux+v2.6.36/Documentation/networking/packet_mmap.txt talk about using PACKET_MMAP and mmap() to improve the performance of raw sockets
The Overview of Packet Reception suggests setting your process's affinity to match the CPU to which you bind the NIC using RPS.
Does DC++ do a promiscuous receive? I wouldn't have guessed so. So instead of comparing your performance to DC++, perhaps you should compare your performance to the performance of a utility like libpcap.

May be because, TCP/IP stack running on NIC and DC++ is getting stream of data directly from NIC, so your processor is not doing any TCP/IP work. But in your case I think you are directly trying to get data from NIC, so it will not be processed by NIC but by your processor, and as you have infinite loop to fetch data, you doing lot of processing... so CPU usage spiked.

Related

Why is the p-state status MSR on ryzen not changing?

I am trying to detect the current p-state of my cpu. I have noticed that the p-state status MSR (C001_0063) always returns 2 on my ryzen 1700x system, even if the core is clearly not in that state. I think it used to work with the initial bios (v0403) that my motherboard came with, but that's not available for download anymore1.
My cpu is overclocked2 to 3.8GHz. I used cpufreq-set to fix the speed and cpufreq-info to verify:
analyzing CPU 0:
driver: acpi-cpufreq
CPUs which run at the same hardware frequency: 0
CPUs which need to have their frequency coordinated by software: 0
maximum transition latency: 4294.55 ms.
hardware limits: 2.20 GHz - 3.80 GHz
available frequency steps: 3.80 GHz, 2.20 GHz
available cpufreq governors: ondemand, conservative, performance, schedutil
current policy: frequency should be within 3.80 GHz and 3.80 GHz.
The governor "performance" may decide which speed to use
within this range.
current CPU frequency is 3.80 GHz (asserted by call to hardware).
Following is a little test program that shows the value of the register for core #0, along with the effective speed relative to P0 state. Needs root privileges. For me, it constantly prints pstate: 2, speed: 99% under load.
#include <stdio.h>
#include <stdint.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
int main(int argc, char** argv)
{
uint64_t aperf_old = 0;
uint64_t mperf_old = 0;
int fd;
fd = open("/dev/cpu/0/msr", O_RDONLY);
uint64_t pstate_limits;
pread(fd, &pstate_limits, sizeof(pstate_limits), 0xC0010061);
printf("pstate ranges: %d to %d\n", (int)(pstate_limits & 0x07), (int)((pstate_limits >> 4) & 0x07));
for(;;)
{
uint64_t pstate;
uint64_t pstate_req;
uint64_t aperf;
uint64_t mperf;
pread(fd, &pstate_req, sizeof(pstate_req), 0xC0010062);
pread(fd, &pstate, sizeof(pstate), 0xC0010063);
pread(fd, &aperf, sizeof(aperf), 0x000000E8);
pread(fd, &mperf, sizeof(mperf), 0x000000E7);
printf("pstate: %d, requested: %d", (int)(pstate & 0x07), (int)(pstate_req & 0x07));
if (mperf_old != 0 && mperf_old != mperf)
{
printf(", speed: %d%%", (int)(100 * (aperf - aperf_old) / (mperf - mperf_old)));
}
putchar('\n');
mperf_old = mperf;
aperf_old = aperf;
sleep(1);
}
}
A similar approach used to work on my FX-8350. What am I doing wrong? Test results also welcome.
System information:
Cpu: ryzen 1700x, P0 & P1 is 3.8GHz3, P2 is 2.2GHz
Motherboard: Asus Prime X370-A, bios 3401
Operating system: debian 7.1, kernel 4.9.0
Update: I have changed the code to print the requested pstate and that register is changing as expected. The actual cpu speed is changing too, as confirmed by various benchmarks.
1 For some obscure reason, the bios backup function is disabled, so I couldn't make a copy before updating.
2 I will run a test at defaults when I get a chance.
3 No idea why it's duplicated.
Per-core (in reality: per CCD) P-States are in the undocumented MSR c0010293, bits 22:24, I think.
It may not be related, however I have heard that some people have successfully had their Ryzen 7s replaced by AMD, due to p states causing system stability issues in Unix or Unix like systems. Although from commentary on the matter, especially from ASrock forum points towards a driver/firmware issue, I have read elsewhere that AMD may be taking some responsibility with early batches of chips. That said from what I know of P and C states, some states are requested by the operating system, or host vertulisation environment. So a patch could be done from within?

Number of cores rocket-chip

I am operating Linux on top of spike and the rocket-chip. In order to evaluate a program I am trying to get the # of cores configured in spike and the rocket-chip. I already tried to get the information threw proc/cpuinfo with now success. I also wrote a little program:
#include <stdio.h>
#include <unistd.h>
int main()
{
int numofcores = sysconf(_SC_NPROCESSORS_ONLN);
printf("Core(s) : %d\n", numofcores);
return 0;
}
The problem with this program is that it returns 1, which cannot be the correct value, because I configured 2 cores. Is there another possibility to get the # of cores?
Are you sure linux can see both cores? You can check this with something like: cat /proc/cpuinfo. To support multicore, you will need to turn on SMP support when building riscv-linux.

How to set/get socket RTT in linux socket programming?

I need to set or get RTT in socket(AF_INET, SOCK_RAW, IPPROTO_TCP) way.
What do I need to do next to control such RTT in socket programming? In other words, how to find such RTT parameter?
On Linux you can get the RTT by calling getsockopt() with TCP_INFO:
#include <sys/socket.h>
#include <netinet/in.h>
#include <netinet/tcp.h>
/* ... */
tcp_info info;
socklen_t tcp_info_length = sizeof info;
ret = getsockopt(sock, SOL_TCP, TCP_INFO, &info, &tcp_info_length);
printf("rtt: %u microseconds\n", info.tcpi_rtt);
To measure the Round Trip Time (RTT) write a simple client-server application where one node:
Reads the current time with clock_gettime()
Sends a message to the other node using write() on the (already opened) socket
Waits the message back using read()
Reads the current time using clock_gettime()
RTT is the difference between the two times.

unexpected behavior of linux malloc

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include<pthread.h>
#define BLOCKSIZE 1024*1024
// #define BLOCKSIZE 4096
int main (int argc, char *argv[])
{
void *myblock = NULL;
int count = 0;
while (1)
{
myblock = malloc(BLOCKSIZE);
if (!myblock){
puts("error"); break;
}
memset(myblock,1, BLOCKSIZE);
count++;
}
printf("Currently allocated %d \n",count);
printf("end");
exit(0);
}
When BLOCKSIZE is 1024*1024. All is ok. Malloc return NULL, loop is break. Program print text and exit.
When BLOCKSIZE is 4096
Malloc never returns NULL Program crash. => Out of memory , killed by kernel .
Why?
It's pitch black, you are likely to be eaten by an OOM killer.
Linux has this thing called an OOM killer which wanders about killing off processes when it finds memory allocation is very heavy. The selection of which process(es) to kill is based on certain properties of each process (such as one allocating a lot of memory being a prime candidate).
It does this, partly due to its optimistic memory allocation strategy (it will generally give you address space whether or not there's enough backing memory on devices for it, something known as overcommit).
It's likely in this case that, when allocating 1M at a time, an allocation fails before the OOM killer finds you. With 4K, you're discovered before the allocation routines decide you've had enough.
You can configure the OOM killer to leave you alone if that's your desire, by writing an adjustment value of -17 to your oom_adj entry in procfs. It's not advisable unless you know what your doing since it puts other (perhaps more important) processes at risk. Other values from -16 to +15 adjust the likelihood that your process will be selected.
You can also turn off overcommit altogether by writing vm.overcommit_memory=2 to /etc/sysctl.conf but that again can present problems in your environment.

Linux shared memory allocation on x86_64

I have 64 bit REHL linux, Linux ipms-sol1 2.6.32-71.el6.x86_64 #1 SMP x86_64 x86_64 x86_64 GNU/Linux
RAM size = ~38GB
I changed default shared memory limits as follows in /etc/sysctl.conf & loaded changed file in memory as sysctl -p
kernel.shmmni=81474836
kernel.shmmax=32212254720
kernel.shmall=7864320
Just for experimental basis I have changed shmmax size to 32GB and tried allocating 10GB in code using shmget() as given below, but it fails to get 10GB of shared memory in single shot but when I reduce my demand for shared space to 8GB it succeeds any clue as to where am I possibly going wrong?
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#define SHMSZ 10737418240
main()
{
char c;
int shmspaceid;
key_t key;
char *shm, *s;
struct shmid_ds shmid;
key = 5678;
fprintf(stderr,"Changed code\n");
if ((shmspaceid = shmget(key, SHMSZ, IPC_CREAT | 0666)) < 0) {
fprintf(stderr,"ERROR memory allocation failed\n");
return 1;
}
shmctl(shmspaceid, IPC_RMID, &shmid);
return 0;
}
Regards
Himanshu
I'm not sure that this solution is applicable to shared memory as well, but I know this phenomenon from normal malloc() calls.
It's pretty usual that you cannot allocate very large blocks of memory as you try it here. What the functions call means is "Allocate me a block of continuous memory of 10737418240 bytes". Often times, even if the total system memory could theoretically satisfy this need, the implied "a block of continuous memory" forces the limit of allocatable memory to be much lower.
The in-memory program structure, the number of programs loaded can all contribute to blocking certain areas of memory and not allow there to be 10 continuous gigabytes of memory allocatable.
I have found often times that a reboot will change that (as programs get loaded to a different position on the heap). You can try out your maximum allocatable block size with something like this:
int i=1024;
int error=0;
while(!error) {
char *a=(char*)malloc(i);
error=(a==null);
if(!error)
printf("Successfully allocated %i.\n", i);
i*=2;
}
Hope this helps or is applicable here. I found this out while checking why I could not allocate close to maximum system memory to a JVM.
Shot in the dark: you don't have enough swap space. Shared memory, by default, requires reserving space in swap. You can disable this behavior using SHM_NORESERVE:
http://linux.die.net/man/2/shmget
SHM_NORESERVE (since Linux 2.6.15) This flag serves the same purpose
as the mmap(2) MAP_NORESERVE flag. Do not reserve swap space for this
segment. When swap space is reserved, one has the guarantee that it is
possible to modify the segment. When swap space is not reserved one
might get SIGSEGV upon a write if no physical memory is available. See
also the discussion of the file /proc/sys/vm/overcommit_memory in
proc(5).
I was just looking at this and I recommend printing out the exact errno value and description for the problem, rather than just noting that it failed. For example:
#include <sys/types.h>
#include <sys/ipc.h>
#include <sys/shm.h>
#include <stdio.h>
#include <errno.h>
#include <string.h>
//#define SHMSZ 10737418240
#define SHMSZ 8589934592
int main()
{
int shmspaceid;
key_t key = 5678;
struct shmid_ds shmid;
if ((shmspaceid = shmget(key, SHMSZ, IPC_CREAT | 0666)) < 0) {
fprintf(stderr,"ERROR with shmget (%d: %s)\n", (int)(errno), strerror(errno));
return 1;
}
shmctl(shmspaceid, IPC_RMID, &shmid);
return 0;
}
I tried to reproduce your problem with an 8 GB block and 8 GB smhmax and shmall on my 16 GB system, but I could not. It worked fine. I do recommend using ipcs -m to look for other shared blocks that might prevent your 10 GB allocation from being honored. And definitely look closely at the exact error code that shmget() is returning through errno.

Resources